Downloading winning-args-corpus to /root/.convokit/downloads/winning-args-corpus
Downloading winning-args-corpus from http://zissou.infosci.cornell.edu/convokit/datasets/winning-args-corpus/winning-args-corpus.zip (73.7MB)... Done
[91mWARNING: [0mUtterance text must be a string: text of utterance with ID: t1_cnhre1n has been casted to a string.
[91mWARNING: [0mUtterance text must be a string: text of utterance with ID: t1_cnhs1jf has been casted to a string.
[91mWARNING: [0mUtterance text must be a string: text of utterance with ID: t1_cn7mmnt has been casted to a string.
[91mWARNING: [0mUtterance text must be a string: text of utterance with ID: t1_cn66mck has been casted to a string.
[91mWARNING: [0mUtterance text must be a string: text of utterance with ID: t1_cmt6w97 has been casted to a string.
[91mWARNING: [0mUtterance text must be a string: text of utterance with ID: t1_cmsgxzm has been casted to a string.
[91mWARNING: [0mUtterance text must be a string: text of utterance with ID: t1_cmsitjr has been casted to a string.
[91mWARNING: [0mUtterance text must be a string: text of utterance with ID: t1_cmqqd49 has been casted to a string.
[91mWARNING: [0mUtterance text must be a string: text of utterance with ID: t1_cmqsp80 has been casted to a string.
[91mWARNING: [0mUtterance text must be a string: text of utterance with ID: t1_cmr57l3 has been casted to a string.
[91mWARNING: [0mUtterance text must be a string: text of utterance with ID: t1_cmihu76 has been casted to a string.
[91mWARNING: [0mUtterance text must be a string: text of utterance with ID: t1_cm85qp1 has been casted to a string.
[91mWARNING: [0mUtterance text must be a string: text of utterance with ID: t1_cm6tktt has been casted to a string.
[91mWARNING: [0mUtterance text must be a string: text of utterance with ID: t1_clz6jxt has been casted to a string.
[91mWARNING: [0mUtterance text must be a string: text of utterance with ID: t1_cm09icp has been casted to a string.
[91mWARNING: [0mUtterance text must be a string: text of utterance with ID: t1_cly0oho has been casted to a string.
[91mWARNING: [0mUtterance text must be a string: text of utterance with ID: t1_cly8wzq has been casted to a string.
[91mWARNING: [0mUtterance text must be a string: text of utterance with ID: t1_clxpe30 has been casted to a string.
[91mWARNING: [0mUtterance text must be a string: text of utterance with ID: t1_clv6oas has been casted to a string.
[91mWARNING: [0mUtterance text must be a string: text of utterance with ID: t1_cls8zr2 has been casted to a string.
[91mWARNING: [0mUtterance text must be a string: text of utterance with ID: t1_clj3tcp has been casted to a string.
[91mWARNING: [0mUtterance text must be a string: text of utterance with ID: t1_clhtmr0 has been casted to a string.
[91mWARNING: [0mUtterance text must be a string: text of utterance with ID: t1_cler134 has been casted to a string.
[91mWARNING: [0mUtterance text must be a string: text of utterance with ID: t1_claiask has been casted to a string.
[91mWARNING: [0mUtterance text must be a string: text of utterance with ID: t1_cl96qxj has been casted to a string.
[91mWARNING: [0mUtterance text must be a string: text of utterance with ID: t1_cl92hgd has been casted to a string.
[91mWARNING: [0mUtterance text must be a string: text of utterance with ID: t1_cl92jdc has been casted to a string.
[91mWARNING: [0mUtterance text must be a string: text of utterance with ID: t1_cl7lil7 has been casted to a string.
[91mWARNING: [0mUtterance text must be a string: text of utterance with ID: t1_claaeq7 has been casted to a string.
[91mWARNING: [0mUtterance text must be a string: text of utterance with ID: t1_cl6xkrl has been casted to a string.
[91mWARNING: [0mUtterance text must be a string: text of utterance with ID: t1_cl6yu6h has been casted to a string.
[91mWARNING: [0mUtterance text must be a string: text of utterance with ID: t1_cl6ywwj has been casted to a string.
[91mWARNING: [0mUtterance text must be a string: text of utterance with ID: t1_cl6yxf2 has been casted to a string.
[91mWARNING: [0mUtterance text must be a string: text of utterance with ID: t1_cl6u197 has been casted to a string.
[91mWARNING: [0mUtterance text must be a string: text of utterance with ID: t1_cl65sus has been casted to a string.
[91mWARNING: [0mUtterance text must be a string: text of utterance with ID: t1_cl67wux has been casted to a string.
[91mWARNING: [0mUtterance text must be a string: text of utterance with ID: t1_cl5n0e5 has been casted to a string.
[91mWARNING: [0mUtterance text must be a string: text of utterance with ID: t1_cl55jtf has been casted to a string.
[91mWARNING: [0mUtterance text must be a string: text of utterance with ID: t1_cl53psw has been casted to a string.
[91mWARNING: [0mUtterance text must be a string: text of utterance with ID: t1_cl54hde has been casted to a string.
[91mWARNING: [0mUtterance text must be a string: text of utterance with ID: t1_cl5490t has been casted to a string.
[91mWARNING: [0mUtterance text must be a string: text of utterance with ID: t1_cl38tmp has been casted to a string.
[91mWARNING: [0mUtterance text must be a string: text of utterance with ID: t1_cl2rq86 has been casted to a string.
[91mWARNING: [0mUtterance text must be a string: text of utterance with ID: t1_ckujlyq has been casted to a string.
[91mWARNING: [0mUtterance text must be a string: text of utterance with ID: t1_ckujdib has been casted to a string.
[91mWARNING: [0mUtterance text must be a string: text of utterance with ID: t1_cklsm40 has been casted to a string.
[91mWARNING: [0mUtterance text must be a string: text of utterance with ID: t1_ckmghf2 has been casted to a string.
[91mWARNING: [0mUtterance text must be a string: text of utterance with ID: t1_cknmezc has been casted to a string.
[91mWARNING: [0mUtterance text must be a string: text of utterance with ID: t1_ckj2g0i has been casted to a string.
[91mWARNING: [0mUtterance text must be a string: text of utterance with ID: t1_ck8j9e4 has been casted to a string.
[91mWARNING: [0mUtterance text must be a string: text of utterance with ID: t1_ck8cwb6 has been casted to a string.
[91mWARNING: [0mUtterance text must be a string: text of utterance with ID: t1_cjxlz5l has been casted to a string.
[91mWARNING: [0mUtterance text must be a string: text of utterance with ID: t1_cjntq7s has been casted to a string.
[91mWARNING: [0mUtterance text must be a string: text of utterance with ID: t1_cjnp5hy has been casted to a string.
[91mWARNING: [0mUtterance text must be a string: text of utterance with ID: t1_cjiqn1e has been casted to a string.
[91mWARNING: [0mUtterance text must be a string: text of utterance with ID: t1_cjh7hr0 has been casted to a string.
[91mWARNING: [0mUtterance text must be a string: text of utterance with ID: t1_cjh6yab has been casted to a string.
[91mWARNING: [0mUtterance text must be a string: text of utterance with ID: t1_cjcd62j has been casted to a string.
[91mWARNING: [0mUtterance text must be a string: text of utterance with ID: t1_cj8l1kv has been casted to a string.
[91mWARNING: [0mUtterance text must be a string: text of utterance with ID: t1_cj8zjae has been casted to a string.
[91mWARNING: [0mUtterance text must be a string: text of utterance with ID: t1_cj7yiy4 has been casted to a string.
[91mWARNING: [0mUtterance text must be a string: text of utterance with ID: t1_cj7o50t has been casted to a string.
[91mWARNING: [0mUtterance text must be a string: text of utterance with ID: t1_cj50shk has been casted to a string.
[91mWARNING: [0mUtterance text must be a string: text of utterance with ID: t1_cizh6xb has been casted to a string.
[91mWARNING: [0mUtterance text must be a string: text of utterance with ID: t1_cj0ukb7 has been casted to a string.
[91mWARNING: [0mUtterance text must be a string: text of utterance with ID: t1_ciz9aka has been casted to a string.
[91mWARNING: [0mUtterance text must be a string: text of utterance with ID: t1_cize32v has been casted to a string.
[91mWARNING: [0mUtterance text must be a string: text of utterance with ID: t1_civr4mn has been casted to a string.
[91mWARNING: [0mUtterance text must be a string: text of utterance with ID: t1_cip166t has been casted to a string.
[91mWARNING: [0mUtterance text must be a string: text of utterance with ID: t1_cil3zav has been casted to a string.
[91mWARNING: [0mUtterance text must be a string: text of utterance with ID: t1_cijy9lk has been casted to a string.
[91mWARNING: [0mUtterance text must be a string: text of utterance with ID: t1_cijcups has been casted to a string.
[91mWARNING: [0mUtterance text must be a string: text of utterance with ID: t1_cifu4zp has been casted to a string.
[91mWARNING: [0mUtterance text must be a string: text of utterance with ID: t1_cid30or has been casted to a string.
[91mWARNING: [0mUtterance text must be a string: text of utterance with ID: t1_ci8wo4e has been casted to a string.
[91mWARNING: [0mUtterance text must be a string: text of utterance with ID: t1_ci9fbtw has been casted to a string.
[91mWARNING: [0mUtterance text must be a string: text of utterance with ID: t1_ci2pixf has been casted to a string.
[91mWARNING: [0mUtterance text must be a string: text of utterance with ID: t1_ci2pcku has been casted to a string.
[91mWARNING: [0mUtterance text must be a string: text of utterance with ID: t1_ci2cc99 has been casted to a string.
[91mWARNING: [0mUtterance text must be a string: text of utterance with ID: t1_chz0gqi has been casted to a string.
[91mWARNING: [0mUtterance text must be a string: text of utterance with ID: t1_chw609b has been casted to a string.
[91mWARNING: [0mUtterance text must be a string: text of utterance with ID: t1_chv6j3s has been casted to a string.
[91mWARNING: [0mUtterance text must be a string: text of utterance with ID: t1_chv26as has been casted to a string.
[91mWARNING: [0mUtterance text must be a string: text of utterance with ID: t1_churmo0 has been casted to a string.
[91mWARNING: [0mUtterance text must be a string: text of utterance with ID: t1_chswmlq has been casted to a string.
[91mWARNING: [0mUtterance text must be a string: text of utterance with ID: t1_chsxa75 has been casted to a string.
[91mWARNING: [0mUtterance text must be a string: text of utterance with ID: t1_chnlrqw has been casted to a string.
[91mWARNING: [0mUtterance text must be a string: text of utterance with ID: t1_chl8bsy has been casted to a string.
[91mWARNING: [0mUtterance text must be a string: text of utterance with ID: t1_chiw401 has been casted to a string.
[91mWARNING: [0mUtterance text must be a string: text of utterance with ID: t1_chihwy1 has been casted to a string.
[91mWARNING: [0mUtterance text must be a string: text of utterance with ID: t1_chh0sby has been casted to a string.
[91mWARNING: [0mUtterance text must be a string: text of utterance with ID: t1_chbrqn7 has been casted to a string.
[91mWARNING: [0mUtterance text must be a string: text of utterance with ID: t1_ch7rh7p has been casted to a string.
[91mWARNING: [0mUtterance text must be a string: text of utterance with ID: t1_ch6watf has been casted to a string.
[91mWARNING: [0mUtterance text must be a string: text of utterance with ID: t1_ch6ssui has been casted to a string.
[91mWARNING: [0mUtterance text must be a string: text of utterance with ID: t1_ch61v2p has been casted to a string.
[91mWARNING: [0mUtterance text must be a string: text of utterance with ID: t1_ch3ng8c has been casted to a string.
[91mWARNING: [0mUtterance text must be a string: text of utterance with ID: t1_cgrm70r has been casted to a string.
[91mWARNING: [0mUtterance text must be a string: text of utterance with ID: t1_cgrz99m has been casted to a string.
[91mWARNING: [0mUtterance text must be a string: text of utterance with ID: t1_cgsegx6 has been casted to a string.
[91mWARNING: [0mUtterance text must be a string: text of utterance with ID: t1_cgnj6j9 has been casted to a string.
[91mWARNING: [0mUtterance text must be a string: text of utterance with ID: t1_ch073vt has been casted to a string.
[91mWARNING: [0mUtterance text must be a string: text of utterance with ID: t1_cgkn5ae has been casted to a string.
[91mWARNING: [0mUtterance text must be a string: text of utterance with ID: t1_cgiihu5 has been casted to a string.
[91mWARNING: [0mUtterance text must be a string: text of utterance with ID: t1_cgieyuu has been casted to a string.
[91mWARNING: [0mUtterance text must be a string: text of utterance with ID: t1_cgig23g has been casted to a string.
[91mWARNING: [0mUtterance text must be a string: text of utterance with ID: t1_cgie34t has been casted to a string.
[91mWARNING: [0mUtterance text must be a string: text of utterance with ID: t1_cgh4ifl has been casted to a string.
[91mWARNING: [0mUtterance text must be a string: text of utterance with ID: t1_cggr63w has been casted to a string.
[91mWARNING: [0mUtterance text must be a string: text of utterance with ID: t1_cggd3v7 has been casted to a string.
[91mWARNING: [0mUtterance text must be a string: text of utterance with ID: t1_cggutk5 has been casted to a string.
[91mWARNING: [0mUtterance text must be a string: text of utterance with ID: t1_cgeiql6 has been casted to a string.
[91mWARNING: [0mUtterance text must be a string: text of utterance with ID: t1_cgej6dy has been casted to a string.
[91mWARNING: [0mUtterance text must be a string: text of utterance with ID: t1_cge3th4 has been casted to a string.
[91mWARNING: [0mUtterance text must be a string: text of utterance with ID: t1_cgedp3m has been casted to a string.
[91mWARNING: [0mUtterance text must be a string: text of utterance with ID: t1_cg8kgbj has been casted to a string.
[91mWARNING: [0mUtterance text must be a string: text of utterance with ID: t1_cfwc45l has been casted to a string.
[91mWARNING: [0mUtterance text must be a string: text of utterance with ID: t1_cfv3enu has been casted to a string.
[91mWARNING: [0mUtterance text must be a string: text of utterance with ID: t1_cfpx424 has been casted to a string.
[91mWARNING: [0mUtterance text must be a string: text of utterance with ID: t1_cfl78p1 has been casted to a string.
[91mWARNING: [0mUtterance text must be a string: text of utterance with ID: t1_cfilsz7 has been casted to a string.
[91mWARNING: [0mUtterance text must be a string: text of utterance with ID: t1_cfj8wvg has been casted to a string.
[91mWARNING: [0mUtterance text must be a string: text of utterance with ID: t1_cffznn5 has been casted to a string.
[91mWARNING: [0mUtterance text must be a string: text of utterance with ID: t1_cffo8l7 has been casted to a string.
[91mWARNING: [0mUtterance text must be a string: text of utterance with ID: t1_cfdq2z7 has been casted to a string.
[91mWARNING: [0mUtterance text must be a string: text of utterance with ID: t1_cfc9c4b has been casted to a string.
[91mWARNING: [0mUtterance text must be a string: text of utterance with ID: t1_cfcrvtf has been casted to a string.
[91mWARNING: [0mUtterance text must be a string: text of utterance with ID: t1_cfbxgor has been casted to a string.
[91mWARNING: [0mUtterance text must be a string: text of utterance with ID: t1_cfb4q61 has been casted to a string.
[91mWARNING: [0mUtterance text must be a string: text of utterance with ID: t1_cfb5l7x has been casted to a string.
[91mWARNING: [0mUtterance text must be a string: text of utterance with ID: t1_cf99z8w has been casted to a string.
[91mWARNING: [0mUtterance text must be a string: text of utterance with ID: t1_cf2h04o has been casted to a string.
[91mWARNING: [0mUtterance text must be a string: text of utterance with ID: t1_ceptv5v has been casted to a string.
[91mWARNING: [0mUtterance text must be a string: text of utterance with ID: t1_ceo2hjj has been casted to a string.
[91mWARNING: [0mUtterance text must be a string: text of utterance with ID: t1_ceorou4 has been casted to a string.
[91mWARNING: [0mUtterance text must be a string: text of utterance with ID: t1_cenyp1x has been casted to a string.
[91mWARNING: [0mUtterance text must be a string: text of utterance with ID: t1_cemye2y has been casted to a string.
[91mWARNING: [0mUtterance text must be a string: text of utterance with ID: t1_celsimn has been casted to a string.
[91mWARNING: [0mUtterance text must be a string: text of utterance with ID: t1_ceehosv has been casted to a string.
[91mWARNING: [0mUtterance text must be a string: text of utterance with ID: t1_cebrznc has been casted to a string.
[91mWARNING: [0mUtterance text must be a string: text of utterance with ID: t1_ceconce has been casted to a string.
[91mWARNING: [0mUtterance text must be a string: text of utterance with ID: t1_cebnxzk has been casted to a string.
[91mWARNING: [0mUtterance text must be a string: text of utterance with ID: t1_cebd6e3 has been casted to a string.
[91mWARNING: [0mUtterance text must be a string: text of utterance with ID: t1_cec3nwt has been casted to a string.
[91mWARNING: [0mUtterance text must be a string: text of utterance with ID: t1_chv7nyk has been casted to a string.
[91mWARNING: [0mUtterance text must be a string: text of utterance with ID: t1_cedktpa has been casted to a string.
[91mWARNING: [0mUtterance text must be a string: text of utterance with ID: t1_ce9dob3 has been casted to a string.
[91mWARNING: [0mUtterance text must be a string: text of utterance with ID: t1_ce5q8tj has been casted to a string.
[91mWARNING: [0mUtterance text must be a string: text of utterance with ID: t1_ce4gt8u has been casted to a string.
[91mWARNING: [0mUtterance text must be a string: text of utterance with ID: t1_ce0p904 has been casted to a string.
[91mWARNING: [0mUtterance text must be a string: text of utterance with ID: t1_cdosehh has been casted to a string.
[91mWARNING: [0mUtterance text must be a string: text of utterance with ID: t1_cdiarjx has been casted to a string.
[91mWARNING: [0mUtterance text must be a string: text of utterance with ID: t1_cda83lr has been casted to a string.
[91mWARNING: [0mUtterance text must be a string: text of utterance with ID: t1_cd759lq has been casted to a string.
[91mWARNING: [0mUtterance text must be a string: text of utterance with ID: t1_cd4pzb7 has been casted to a string.
[91mWARNING: [0mUtterance text must be a string: text of utterance with ID: t1_cd4p7s1 has been casted to a string.
[91mWARNING: [0mUtterance text must be a string: text of utterance with ID: t1_cd4zoy7 has been casted to a string.
[91mWARNING: [0mUtterance text must be a string: text of utterance with ID: t1_cdn4qk9 has been casted to a string.
[91mWARNING: [0mUtterance text must be a string: text of utterance with ID: t1_ccue3mg has been casted to a string.
[91mWARNING: [0mUtterance text must be a string: text of utterance with ID: t1_ccphf6t has been casted to a string.
[91mWARNING: [0mUtterance text must be a string: text of utterance with ID: t1_ccphsld has been casted to a string.
[91mWARNING: [0mUtterance text must be a string: text of utterance with ID: t1_ccmc12f has been casted to a string.
[91mWARNING: [0mUtterance text must be a string: text of utterance with ID: t1_ccjdtdf has been casted to a string.
[91mWARNING: [0mUtterance text must be a string: text of utterance with ID: t1_cciu2c8 has been casted to a string.
[91mWARNING: [0mUtterance text must be a string: text of utterance with ID: t1_ccizwjs has been casted to a string.
[91mWARNING: [0mUtterance text must be a string: text of utterance with ID: t1_ccivs3y has been casted to a string.
[91mWARNING: [0mUtterance text must be a string: text of utterance with ID: t1_ccj0iqr has been casted to a string.
[91mWARNING: [0mUtterance text must be a string: text of utterance with ID: t1_cciupg2 has been casted to a string.
[91mWARNING: [0mUtterance text must be a string: text of utterance with ID: t1_cch8ibq has been casted to a string.
[91mWARNING: [0mUtterance text must be a string: text of utterance with ID: t1_ccgulma has been casted to a string.
[91mWARNING: [0mUtterance text must be a string: text of utterance with ID: t1_ccgulwv has been casted to a string.
[91mWARNING: [0mUtterance text must be a string: text of utterance with ID: t1_ccgnj8i has been casted to a string.
[91mWARNING: [0mUtterance text must be a string: text of utterance with ID: t1_cch2tad has been casted to a string.
[91mWARNING: [0mUtterance text must be a string: text of utterance with ID: t1_cch9i4l has been casted to a string.
[91mWARNING: [0mUtterance text must be a string: text of utterance with ID: t1_cch9wwe has been casted to a string.
[91mWARNING: [0mUtterance text must be a string: text of utterance with ID: t1_ccgvmn6 has been casted to a string.
[91mWARNING: [0mUtterance text must be a string: text of utterance with ID: t1_cce35j0 has been casted to a string.
[91mWARNING: [0mUtterance text must be a string: text of utterance with ID: t1_ccdamor has been casted to a string.
[91mWARNING: [0mUtterance text must be a string: text of utterance with ID: t1_cce8w7z has been casted to a string.
[91mWARNING: [0mUtterance text must be a string: text of utterance with ID: t1_cce42qt has been casted to a string.
[91mWARNING: [0mUtterance text must be a string: text of utterance with ID: t1_cc9x38t has been casted to a string.
[91mWARNING: [0mUtterance text must be a string: text of utterance with ID: t1_cc302nb has been casted to a string.
[91mWARNING: [0mUtterance text must be a string: text of utterance with ID: t1_cbyfuze has been casted to a string.
[91mWARNING: [0mUtterance text must be a string: text of utterance with ID: t1_cbvq8lh has been casted to a string.
[91mWARNING: [0mUtterance text must be a string: text of utterance with ID: t1_cbrec2x has been casted to a string.
[91mWARNING: [0mUtterance text must be a string: text of utterance with ID: t1_cbrmg6l has been casted to a string.
[91mWARNING: [0mUtterance text must be a string: text of utterance with ID: t1_cbq8ij7 has been casted to a string.
[91mWARNING: [0mUtterance text must be a string: text of utterance with ID: t1_cbp1a4b has been casted to a string.
[91mWARNING: [0mUtterance text must be a string: text of utterance with ID: t1_cboztg2 has been casted to a string.
[91mWARNING: [0mUtterance text must be a string: text of utterance with ID: t1_cbou2b5 has been casted to a string.
[91mWARNING: [0mUtterance text must be a string: text of utterance with ID: t1_cbpau3m has been casted to a string.
[91mWARNING: [0mUtterance text must be a string: text of utterance with ID: t1_cbokj1o has been casted to a string.
[91mWARNING: [0mUtterance text must be a string: text of utterance with ID: t1_cboz1er has been casted to a string.
[91mWARNING: [0mUtterance text must be a string: text of utterance with ID: t1_cbmr775 has been casted to a string.
[91mWARNING: [0mUtterance text must be a string: text of utterance with ID: t1_cblsags has been casted to a string.
[91mWARNING: [0mUtterance text must be a string: text of utterance with ID: t1_cbqpqwi has been casted to a string.
[91mWARNING: [0mUtterance text must be a string: text of utterance with ID: t1_cb9lo66 has been casted to a string.
[91mWARNING: [0mUtterance text must be a string: text of utterance with ID: t1_cb9lrsq has been casted to a string.
[91mWARNING: [0mUtterance text must be a string: text of utterance with ID: t1_cb9zx65 has been casted to a string.
[91mWARNING: [0mUtterance text must be a string: text of utterance with ID: t1_cb8beut has been casted to a string.
[91mWARNING: [0mUtterance text must be a string: text of utterance with ID: t1_cb7zk0m has been casted to a string.
[91mWARNING: [0mUtterance text must be a string: text of utterance with ID: t1_cb6c4sq has been casted to a string.
[91mWARNING: [0mUtterance text must be a string: text of utterance with ID: t1_cb33ath has been casted to a string.
[91mWARNING: [0mUtterance text must be a string: text of utterance with ID: t1_cb1kfhi has been casted to a string.
[91mWARNING: [0mUtterance text must be a string: text of utterance with ID: t1_cb0ahbi has been casted to a string.
[91mWARNING: [0mUtterance text must be a string: text of utterance with ID: t1_cb0bzr4 has been casted to a string.
[91mWARNING: [0mUtterance text must be a string: text of utterance with ID: t1_cawc0b9 has been casted to a string.
[91mWARNING: [0mUtterance text must be a string: text of utterance with ID: t1_cawjc4y has been casted to a string.
[91mWARNING: [0mUtterance text must be a string: text of utterance with ID: t1_caxt2vf has been casted to a string.
[91mWARNING: [0mUtterance text must be a string: text of utterance with ID: t1_caukmyp has been casted to a string.
[91mWARNING: [0mUtterance text must be a string: text of utterance with ID: t1_caurnnc has been casted to a string.
[91mWARNING: [0mUtterance text must be a string: text of utterance with ID: t1_cavlgpq has been casted to a string.
[91mWARNING: [0mUtterance text must be a string: text of utterance with ID: t1_caulblj has been casted to a string.
[91mWARNING: [0mUtterance text must be a string: text of utterance with ID: t1_cauridr has been casted to a string.
[91mWARNING: [0mUtterance text must be a string: text of utterance with ID: t1_casyucd has been casted to a string.
[91mWARNING: [0mUtterance text must be a string: text of utterance with ID: t1_catl342 has been casted to a string.
[91mWARNING: [0mUtterance text must be a string: text of utterance with ID: t1_camzwda has been casted to a string.
[91mWARNING: [0mUtterance text must be a string: text of utterance with ID: t1_camssul has been casted to a string.
[91mWARNING: [0mUtterance text must be a string: text of utterance with ID: t1_can6u3l has been casted to a string.
[91mWARNING: [0mUtterance text must be a string: text of utterance with ID: t1_calik3u has been casted to a string.
[91mWARNING: [0mUtterance text must be a string: text of utterance with ID: t1_caljueg has been casted to a string.
[91mWARNING: [0mUtterance text must be a string: text of utterance with ID: t1_cajb0py has been casted to a string.
[91mWARNING: [0mUtterance text must be a string: text of utterance with ID: t1_cajbvqf has been casted to a string.
[91mWARNING: [0mUtterance text must be a string: text of utterance with ID: t1_caajlk6 has been casted to a string.
[91mWARNING: [0mUtterance text must be a string: text of utterance with ID: t1_caachmu has been casted to a string.
[91mWARNING: [0mUtterance text must be a string: text of utterance with ID: t1_cabc5l7 has been casted to a string.
[91mWARNING: [0mUtterance text must be a string: text of utterance with ID: t1_c9xkwd1 has been casted to a string.
[91mWARNING: [0mUtterance text must be a string: text of utterance with ID: t1_c9za8qk has been casted to a string.
[91mWARNING: [0mUtterance text must be a string: text of utterance with ID: t1_c9x7xos has been casted to a string.
[91mWARNING: [0mUtterance text must be a string: text of utterance with ID: t1_c9zr42i has been casted to a string.
[91mWARNING: [0mUtterance text must be a string: text of utterance with ID: t1_c9tlmhq has been casted to a string.
[91mWARNING: [0mUtterance text must be a string: text of utterance with ID: t1_c9r6a05 has been casted to a string.
[91mWARNING: [0mUtterance text must be a string: text of utterance with ID: t1_ca0tpgp has been casted to a string.
[91mWARNING: [0mUtterance text must be a string: text of utterance with ID: t1_c9r7py2 has been casted to a string.
[91mWARNING: [0mUtterance text must be a string: text of utterance with ID: t1_c9rnobm has been casted to a string.
[91mWARNING: [0mUtterance text must be a string: text of utterance with ID: t1_c9rp0so has been casted to a string.
[91mWARNING: [0mUtterance text must be a string: text of utterance with ID: t1_c9qubp4 has been casted to a string.
[91mWARNING: [0mUtterance text must be a string: text of utterance with ID: t1_c9qwaar has been casted to a string.
[91mWARNING: [0mUtterance text must be a string: text of utterance with ID: t1_c9pfs2o has been casted to a string.
[91mWARNING: [0mUtterance text must be a string: text of utterance with ID: t1_c9lc68q has been casted to a string.
[91mWARNING: [0mUtterance text must be a string: text of utterance with ID: t1_c9jlurv has been casted to a string.
[91mWARNING: [0mUtterance text must be a string: text of utterance with ID: t1_c9hall9 has been casted to a string.
[91mWARNING: [0mUtterance text must be a string: text of utterance with ID: t1_c9gkb83 has been casted to a string.
[91mWARNING: [0mUtterance text must be a string: text of utterance with ID: t1_c9bvjlq has been casted to a string.
[91mWARNING: [0mUtterance text must be a string: text of utterance with ID: t1_c98s9ip has been casted to a string.
[91mWARNING: [0mUtterance text must be a string: text of utterance with ID: t1_c98o57y has been casted to a string.
[91mWARNING: [0mUtterance text must be a string: text of utterance with ID: t1_c99ahf3 has been casted to a string.
[91mWARNING: [0mUtterance text must be a string: text of utterance with ID: t1_c990rx6 has been casted to a string.
[91mWARNING: [0mUtterance text must be a string: text of utterance with ID: t1_c97eoi9 has been casted to a string.
[91mWARNING: [0mUtterance text must be a string: text of utterance with ID: t1_c97acob has been casted to a string.
[91mWARNING: [0mUtterance text must be a string: text of utterance with ID: t1_c97ah5c has been casted to a string.
[91mWARNING: [0mUtterance text must be a string: text of utterance with ID: t1_c95q7s3 has been casted to a string.
[91mWARNING: [0mUtterance text must be a string: text of utterance with ID: t1_c95kdch has been casted to a string.
[91mWARNING: [0mUtterance text must be a string: text of utterance with ID: t1_c95l9ml has been casted to a string.
[91mWARNING: [0mUtterance text must be a string: text of utterance with ID: t1_cqzqi7q has been casted to a string.
[91mWARNING: [0mUtterance text must be a string: text of utterance with ID: t1_cqzmedm has been casted to a string.
[91mWARNING: [0mUtterance text must be a string: text of utterance with ID: t1_cr13cnd has been casted to a string.
[91mWARNING: [0mUtterance text must be a string: text of utterance with ID: t1_cqzr1lp has been casted to a string.
[91mWARNING: [0mUtterance text must be a string: text of utterance with ID: t1_cqyom18 has been casted to a string.
[91mWARNING: [0mUtterance text must be a string: text of utterance with ID: t1_cqybmxe has been casted to a string.
[91mWARNING: [0mUtterance text must be a string: text of utterance with ID: t1_cqy7ahu has been casted to a string.
[91mWARNING: [0mUtterance text must be a string: text of utterance with ID: t1_cqybfct has been casted to a string.
[91mWARNING: [0mUtterance text must be a string: text of utterance with ID: t1_cqyuen2 has been casted to a string.
[91mWARNING: [0mUtterance text must be a string: text of utterance with ID: t1_cqy83y1 has been casted to a string.
[91mWARNING: [0mUtterance text must be a string: text of utterance with ID: t1_cqxoxi5 has been casted to a string.
[91mWARNING: [0mUtterance text must be a string: text of utterance with ID: t1_cqn9nyq has been casted to a string.
[91mWARNING: [0mUtterance text must be a string: text of utterance with ID: t1_cqnhdk2 has been casted to a string.
[91mWARNING: [0mUtterance text must be a string: text of utterance with ID: t1_cqn3mch has been casted to a string.
[91mWARNING: [0mUtterance text must be a string: text of utterance with ID: t1_cqn7pac has been casted to a string.
[91mWARNING: [0mUtterance text must be a string: text of utterance with ID: t1_cqmw3di has been casted to a string.
[91mWARNING: [0mUtterance text must be a string: text of utterance with ID: t1_cqf8ryl has been casted to a string.
[91mWARNING: [0mUtterance text must be a string: text of utterance with ID: t1_cqf8mpf has been casted to a string.
[91mWARNING: [0mUtterance text must be a string: text of utterance with ID: t1_cqfb5xa has been casted to a string.
[91mWARNING: [0mUtterance text must be a string: text of utterance with ID: t1_cqfhjkq has been casted to a string.
[91mWARNING: [0mUtterance text must be a string: text of utterance with ID: t1_cqgarpy has been casted to a string.
[91mWARNING: [0mUtterance text must be a string: text of utterance with ID: t1_cqdccrd has been casted to a string.
[91mWARNING: [0mUtterance text must be a string: text of utterance with ID: t1_cqctlqr has been casted to a string.
[91mWARNING: [0mUtterance text must be a string: text of utterance with ID: t1_cqcthds has been casted to a string.
[91mWARNING: [0mUtterance text must be a string: text of utterance with ID: t1_cqkelp2 has been casted to a string.
[91mWARNING: [0mUtterance text must be a string: text of utterance with ID: t1_cqd8j90 has been casted to a string.
[91mWARNING: [0mUtterance text must be a string: text of utterance with ID: t1_cqd8a8j has been casted to a string.
[91mWARNING: [0mUtterance text must be a string: text of utterance with ID: t1_cqcrdrw has been casted to a string.
[91mWARNING: [0mUtterance text must be a string: text of utterance with ID: t1_cq8xz69 has been casted to a string.
[91mWARNING: [0mUtterance text must be a string: text of utterance with ID: t1_cq97x51 has been casted to a string.
[91mWARNING: [0mUtterance text must be a string: text of utterance with ID: t1_cq8zb08 has been casted to a string.
[91mWARNING: [0mUtterance text must be a string: text of utterance with ID: t1_cq959on has been casted to a string.
[91mWARNING: [0mUtterance text must be a string: text of utterance with ID: t1_cq9ci0g has been casted to a string.
[91mWARNING: [0mUtterance text must be a string: text of utterance with ID: t1_cq97u6v has been casted to a string.
[91mWARNING: [0mUtterance text must be a string: text of utterance with ID: t1_cptu8ww has been casted to a string.
[91mWARNING: [0mUtterance text must be a string: text of utterance with ID: t1_cpstv77 has been casted to a string.
[91mWARNING: [0mUtterance text must be a string: text of utterance with ID: t1_cpsxhki has been casted to a string.
[91mWARNING: [0mUtterance text must be a string: text of utterance with ID: t1_cpswzfn has been casted to a string.
[91mWARNING: [0mUtterance text must be a string: text of utterance with ID: t1_cpuguvq has been casted to a string.
[91mWARNING: [0mUtterance text must be a string: text of utterance with ID: t1_cpvfq3o has been casted to a string.
[91mWARNING: [0mUtterance text must be a string: text of utterance with ID: t1_cptriew has been casted to a string.
[91mWARNING: [0mUtterance text must be a string: text of utterance with ID: t1_cpsxudr has been casted to a string.
[91mWARNING: [0mUtterance text must be a string: text of utterance with ID: t1_cpszfeh has been casted to a string.
[91mWARNING: [0mUtterance text must be a string: text of utterance with ID: t1_cpqft77 has been casted to a string.
[91mWARNING: [0mUtterance text must be a string: text of utterance with ID: t1_cpqfs0u has been casted to a string.
[91mWARNING: [0mUtterance text must be a string: text of utterance with ID: t1_cprk62o has been casted to a string.
[91mWARNING: [0mUtterance text must be a string: text of utterance with ID: t1_cpnulph has been casted to a string.
[91mWARNING: [0mUtterance text must be a string: text of utterance with ID: t1_cpnn77o has been casted to a string.
[91mWARNING: [0mUtterance text must be a string: text of utterance with ID: t1_cpnju02 has been casted to a string.
[91mWARNING: [0mUtterance text must be a string: text of utterance with ID: t1_cpn5p75 has been casted to a string.
[91mWARNING: [0mUtterance text must be a string: text of utterance with ID: t1_cpni19v has been casted to a string.
[91mWARNING: [0mUtterance text must be a string: text of utterance with ID: t1_cpo4x6m has been casted to a string.
[91mWARNING: [0mUtterance text must be a string: text of utterance with ID: t1_cpbijg5 has been casted to a string.
[91mWARNING: [0mUtterance text must be a string: text of utterance with ID: t1_cpbfv95 has been casted to a string.
[91mWARNING: [0mUtterance text must be a string: text of utterance with ID: t1_coz6meu has been casted to a string.
[91mWARNING: [0mUtterance text must be a string: text of utterance with ID: t1_coz24er has been casted to a string.
[91mWARNING: [0mUtterance text must be a string: text of utterance with ID: t1_coz3omg has been casted to a string.
[91mWARNING: [0mUtterance text must be a string: text of utterance with ID: t1_copv0tz has been casted to a string.
[91mWARNING: [0mUtterance text must be a string: text of utterance with ID: t1_cor1wk2 has been casted to a string.
[91mWARNING: [0mUtterance text must be a string: text of utterance with ID: t1_coq0wpg has been casted to a string.
[91mWARNING: [0mUtterance text must be a string: text of utterance with ID: t1_coqe7nl has been casted to a string.
[91mWARNING: [0mUtterance text must be a string: text of utterance with ID: t1_cokywqo has been casted to a string.
[91mWARNING: [0mUtterance text must be a string: text of utterance with ID: t1_cokwcae has been casted to a string.
[91mWARNING: [0mUtterance text must be a string: text of utterance with ID: t1_cokx8ik has been casted to a string.
[91mWARNING: [0mUtterance text must be a string: text of utterance with ID: t1_cokyzdf has been casted to a string.
[91mWARNING: [0mUtterance text must be a string: text of utterance with ID: t1_col2sjk has been casted to a string.
[91mWARNING: [0mUtterance text must be a string: text of utterance with ID: t1_cokxscn has been casted to a string.
[91mWARNING: [0mUtterance text must be a string: text of utterance with ID: t1_cojw16m has been casted to a string.
[91mWARNING: [0mUtterance text must be a string: text of utterance with ID: t1_coezsna has been casted to a string.
[91mWARNING: [0mUtterance text must be a string: text of utterance with ID: t1_cof1lb9 has been casted to a string.
[91mWARNING: [0mUtterance text must be a string: text of utterance with ID: t1_cohpfyt has been casted to a string.
[91mWARNING: [0mUtterance text must be a string: text of utterance with ID: t1_cof4o1h has been casted to a string.
[91mWARNING: [0mUtterance text must be a string: text of utterance with ID: t1_coeyzd4 has been casted to a string.
[91mWARNING: [0mUtterance text must be a string: text of utterance with ID: t1_coai23q has been casted to a string.
[91mWARNING: [0mUtterance text must be a string: text of utterance with ID: t1_coa8l6z has been casted to a string.
[91mWARNING: [0mUtterance text must be a string: text of utterance with ID: t1_coa80dz has been casted to a string.
[91mWARNING: [0mUtterance text must be a string: text of utterance with ID: t1_coa8l84 has been casted to a string.
[91mWARNING: [0mUtterance text must be a string: text of utterance with ID: t1_co68zqm has been casted to a string.
[91mWARNING: [0mUtterance text must be a string: text of utterance with ID: t1_co6a4yb has been casted to a string.
[91mWARNING: [0mUtterance text must be a string: text of utterance with ID: t1_co6ef8m has been casted to a string.
[91mWARNING: [0mUtterance text must be a string: text of utterance with ID: t1_co6ba44 has been casted to a string.
[91mWARNING: [0mUtterance text must be a string: text of utterance with ID: t1_co6rpqr has been casted to a string.
[91mWARNING: [0mUtterance text must be a string: text of utterance with ID: t1_co69x71 has been casted to a string.
[91mWARNING: [0mUtterance text must be a string: text of utterance with ID: t1_co6a13x has been casted to a string.
[91mWARNING: [0mUtterance text must be a string: text of utterance with ID: t1_cnq3mwr has been casted to a string.
[91mWARNING: [0mUtterance text must be a string: text of utterance with ID: t1_cnq4ycs has been casted to a string.
[91mWARNING: [0mUtterance text must be a string: text of utterance with ID: t1_cnqs4bp has been casted to a string.
[91mWARNING: [0mUtterance text must be a string: text of utterance with ID: t1_cunpy0j has been casted to a string.
[91mWARNING: [0mUtterance text must be a string: text of utterance with ID: t1_cuior6e has been casted to a string.
[91mWARNING: [0mUtterance text must be a string: text of utterance with ID: t1_cufvp2u has been casted to a string.
[91mWARNING: [0mUtterance text must be a string: text of utterance with ID: t1_cu7mdqo has been casted to a string.
[91mWARNING: [0mUtterance text must be a string: text of utterance with ID: t1_cu7xcz4 has been casted to a string.
[91mWARNING: [0mUtterance text must be a string: text of utterance with ID: t1_cu5cv9a has been casted to a string.
[91mWARNING: [0mUtterance text must be a string: text of utterance with ID: t1_cu3h1wa has been casted to a string.
[91mWARNING: [0mUtterance text must be a string: text of utterance with ID: t1_cu272pa has been casted to a string.
[91mWARNING: [0mUtterance text must be a string: text of utterance with ID: t1_cu0qopb has been casted to a string.
[91mWARNING: [0mUtterance text must be a string: text of utterance with ID: t1_ctzro35 has been casted to a string.
[91mWARNING: [0mUtterance text must be a string: text of utterance with ID: t1_ctxzpb4 has been casted to a string.
[91mWARNING: [0mUtterance text must be a string: text of utterance with ID: t1_ctwh70b has been casted to a string.
[91mWARNING: [0mUtterance text must be a string: text of utterance with ID: t1_ctqswb6 has been casted to a string.
[91mWARNING: [0mUtterance text must be a string: text of utterance with ID: t1_ctqjqcd has been casted to a string.
[91mWARNING: [0mUtterance text must be a string: text of utterance with ID: t1_ctoncx7 has been casted to a string.
[91mWARNING: [0mUtterance text must be a string: text of utterance with ID: t1_ctq5xed has been casted to a string.
[91mWARNING: [0mUtterance text must be a string: text of utterance with ID: t1_ctjjz2v has been casted to a string.
[91mWARNING: [0mUtterance text must be a string: text of utterance with ID: t1_cti68mr has been casted to a string.
[91mWARNING: [0mUtterance text must be a string: text of utterance with ID: t1_ctiqywc has been casted to a string.
[91mWARNING: [0mUtterance text must be a string: text of utterance with ID: t1_cthu9hx has been casted to a string.
[91mWARNING: [0mUtterance text must be a string: text of utterance with ID: t1_ctdqpne has been casted to a string.
[91mWARNING: [0mUtterance text must be a string: text of utterance with ID: t1_ctdkusf has been casted to a string.
[91mWARNING: [0mUtterance text must be a string: text of utterance with ID: t1_ctdhx1w has been casted to a string.
[91mWARNING: [0mUtterance text must be a string: text of utterance with ID: t1_ctcfj2v has been casted to a string.
[91mWARNING: [0mUtterance text must be a string: text of utterance with ID: t1_ct5ul8m has been casted to a string.
[91mWARNING: [0mUtterance text must be a string: text of utterance with ID: t1_ct5m7v4 has been casted to a string.
[91mWARNING: [0mUtterance text must be a string: text of utterance with ID: t1_ct4klhi has been casted to a string.
[91mWARNING: [0mUtterance text must be a string: text of utterance with ID: t1_ct3sgfd has been casted to a string.
[91mWARNING: [0mUtterance text must be a string: text of utterance with ID: t1_ct5bruu has been casted to a string.
[91mWARNING: [0mUtterance text must be a string: text of utterance with ID: t1_ct1rmby has been casted to a string.
[91mWARNING: [0mUtterance text must be a string: text of utterance with ID: t1_cstl2de has been casted to a string.
[91mWARNING: [0mUtterance text must be a string: text of utterance with ID: t1_cstboz7 has been casted to a string.
[91mWARNING: [0mUtterance text must be a string: text of utterance with ID: t1_cstbdw5 has been casted to a string.
[91mWARNING: [0mUtterance text must be a string: text of utterance with ID: t1_ct7dtjh has been casted to a string.
[91mWARNING: [0mUtterance text must be a string: text of utterance with ID: t1_csk5kbs has been casted to a string.
[91mWARNING: [0mUtterance text must be a string: text of utterance with ID: t1_ctam9yz has been casted to a string.
[91mWARNING: [0mUtterance text must be a string: text of utterance with ID: t1_csh3dqw has been casted to a string.
[91mWARNING: [0mUtterance text must be a string: text of utterance with ID: t1_csislrs has been casted to a string.
[91mWARNING: [0mUtterance text must be a string: text of utterance with ID: t1_csed3sw has been casted to a string.
[91mWARNING: [0mUtterance text must be a string: text of utterance with ID: t1_ctalrap has been casted to a string.
[91mWARNING: [0mUtterance text must be a string: text of utterance with ID: t1_csernqu has been casted to a string.
[91mWARNING: [0mUtterance text must be a string: text of utterance with ID: t1_csdn1ui has been casted to a string.
[91mWARNING: [0mUtterance text must be a string: text of utterance with ID: t1_cs8mmtf has been casted to a string.
[91mWARNING: [0mUtterance text must be a string: text of utterance with ID: t1_cs4re28 has been casted to a string.
[91mWARNING: [0mUtterance text must be a string: text of utterance with ID: t1_ctb2nf1 has been casted to a string.
[91mWARNING: [0mUtterance text must be a string: text of utterance with ID: t1_cs1tl9q has been casted to a string.
[91mWARNING: [0mUtterance text must be a string: text of utterance with ID: t1_crul067 has been casted to a string.
[91mWARNING: [0mUtterance text must be a string: text of utterance with ID: t1_crusrpd has been casted to a string.
[91mWARNING: [0mUtterance text must be a string: text of utterance with ID: t1_crjkijn has been casted to a string.
[91mWARNING: [0mUtterance text must be a string: text of utterance with ID: t1_cre72f1 has been casted to a string.
[91mWARNING: [0mUtterance text must be a string: text of utterance with ID: t1_creewvt has been casted to a string.
[91mWARNING: [0mUtterance text must be a string: text of utterance with ID: t1_creh7rb has been casted to a string.
[91mWARNING: [0mUtterance text must be a string: text of utterance with ID: t1_crctrfy has been casted to a string.
[91mWARNING: [0mUtterance text must be a string: text of utterance with ID: t1_crbkihy has been casted to a string.
[91mWARNING: [0mUtterance text must be a string: text of utterance with ID: t1_cra0ei4 has been casted to a string.
Encountered erroneus tree! Skipping..
Encountered erroneus tree! Skipping..
Encountered erroneus tree! Skipping..
Encountered erroneus tree! Skipping..
Number of threads in dataset: 120031
--------------EPOCH 1-------------
Test Accuracy: tensor(0.6026, device='cuda:0')
Loss: tensor(0.6008, device='cuda:0', grad_fn=<NllLossBackward>)
Saving model at iteration: 0
Loss: tensor(0.6710, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.6083, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.5580, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.5789, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.5576, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.6176, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.4823, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.6576, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.5095, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.5666, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.5032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.5946, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.6989, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.4697, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3663, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.5142, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.5511, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.4856, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.4649, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.4652, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.4166, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.4809, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.5757, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.5433, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.6363, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.4315, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.4608, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.4290, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.5140, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.5947, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.4431, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.6192, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.4885, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.5137, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.5227, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.5096, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.4855, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.5720, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3083, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.4600, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.4706, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.5471, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.4360, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.4593, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.4237, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.4682, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.5371, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.4647, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.4937, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.5644, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.4036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.4717, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.5116, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.5248, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.5038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.5027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.4675, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3840, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.4567, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.5116, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.6095, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.4192, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.5652, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.4645, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3783, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.4901, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.4988, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.4410, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.4452, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.4259, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.5389, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.4971, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.5279, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.4716, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.4833, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.4172, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.4685, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.5228, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.4756, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.8487, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.4830, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.5193, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3368, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.4159, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.5156, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.4919, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3548, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.4584, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.4860, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.4718, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.4753, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.4601, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.5404, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.5054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.4731, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.4931, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.4202, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.4336, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.4991, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.8121, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.4617, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3972, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3485, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3824, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.4655, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.7634, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3776, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.8164, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.7536, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.4153, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.4241, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.4929, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.4854, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.4788, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.4854, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3748, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.4768, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.4045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.4399, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3006, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.4619, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3711, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3711, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.4352, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.5396, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.4783, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2270, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.4917, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.4098, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.4708, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.4731, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.4638, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.4686, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3640, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3794, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3907, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.5162, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3661, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.4642, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3836, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.5712, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.5645, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.4499, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.5086, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3742, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.4516, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3363, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.4216, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.4409, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.4656, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.4251, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2890, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3541, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3979, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2751, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.4226, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.4516, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.4516, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.5241, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.4361, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3200, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.5089, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3909, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.4174, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3478, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3673, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.4456, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.4523, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.4433, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.4438, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.4438, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.4419, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3916, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.4406, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3563, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.4228, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.4246, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.4574, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3471, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3837, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.5370, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.4139, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.4057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.4352, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.4103, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.4239, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3371, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.4146, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.4320, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.6101, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.4320, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3887, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.4550, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.4304, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3335, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.4520, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3335, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3213, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.5580, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.4598, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.4419, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.4054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3099, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2553, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2718, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3627, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.4754, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.4881, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.4148, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3373, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.4673, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3584, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3940, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3080, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3715, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3905, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3349, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.4191, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3562, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.4152, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3014, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.4031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.4167, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2555, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3577, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3812, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3099, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3107, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3640, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3866, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2943, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3497, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3110, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3869, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3755, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2757, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.4385, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3186, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2657, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2784, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3963, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3677, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3695, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3432, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3663, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2565, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.4307, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.4133, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3534, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.5087, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2667, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3767, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3908, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3702, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.4572, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.4137, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3700, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2854, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3492, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.4768, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3368, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.4078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2664, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3207, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3935, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.4026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3461, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3845, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3789, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3502, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3367, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.4044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3742, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3989, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3189, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.4137, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3254, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3529, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2815, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3955, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.4356, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3278, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3173, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3349, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.4919, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2796, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3439, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3697, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.4207, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.4264, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2779, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3903, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3439, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3954, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3730, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3291, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.4141, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3377, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.4691, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3415, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.4311, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3707, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2889, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2858, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3123, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2644, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3160, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3902, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3807, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.4216, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2772, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3230, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.4293, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2974, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.4943, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.4044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.4062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3661, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2262, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3254, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.4963, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3306, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3416, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2499, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3986, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3351, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2477, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3359, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3129, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2898, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2722, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2953, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3104, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2415, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3592, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1982, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2912, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2814, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3126, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3475, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3584, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3578, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3131, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.4582, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3601, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2318, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2706, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3403, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2925, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3477, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2202, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2444, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2579, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3725, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2700, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2843, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3347, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2628, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3745, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3101, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3015, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3563, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2646, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3406, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2611, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2771, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2685, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2526, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2472, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3255, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2275, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2987, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3390, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3128, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2582, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2215, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2877, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2597, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3219, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3114, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2767, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2336, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2982, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3966, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.4270, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2332, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3937, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2984, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.4132, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.4474, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3587, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3254, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2469, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2825, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2459, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.4301, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2633, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3526, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2615, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3335, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3314, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.4245, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3022, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2254, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2314, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3123, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2136, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.4565, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3322, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3397, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3516, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3815, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2592, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2879, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2125, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3303, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2263, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3315, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2385, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3358, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.4723, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3738, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2584, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3111, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3665, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2885, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3903, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3577, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2887, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3111, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2666, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2317, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3686, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3276, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2893, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2563, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3655, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3278, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3358, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3111, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2881, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3342, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2804, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1904, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2603, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2835, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2736, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2757, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2485, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2280, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2138, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3128, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2743, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2685, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2566, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2182, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2206, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3128, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2335, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.4483, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2833, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2992, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2510, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2990, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2479, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.4493, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3140, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3752, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2776, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3275, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2339, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2967, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3132, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3001, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2344, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2716, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2454, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2337, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2861, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1998, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2351, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2736, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2313, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2248, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3195, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3456, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3084, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2387, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2787, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2312, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3651, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2411, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2984, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.4099, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3213, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2530, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3359, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2590, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2949, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3955, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2525, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3108, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2742, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2511, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2906, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2913, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2857, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2269, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3368, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2617, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2782, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2724, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2468, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2302, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2096, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3221, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2974, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2231, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2594, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2651, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2487, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2257, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3613, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3161, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3463, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2184, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2372, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3145, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3817, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2590, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3083, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3439, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2645, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2900, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2530, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2543, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3175, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2237, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2676, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3394, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3527, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2754, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1754, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1988, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2822, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3394, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3224, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2443, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2626, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2680, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3179, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2498, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1785, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2273, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2283, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2956, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2740, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2516, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1573, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3387, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2541, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2988, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1902, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2103, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2276, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2373, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2867, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2722, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2136, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.4012, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2639, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2713, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1853, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2184, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.5727, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2904, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2007, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1899, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2720, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1509, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1976, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2518, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2393, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2335, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2678, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2329, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2278, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1893, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2494, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2420, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2410, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2199, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2635, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3787, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2213, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2958, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2556, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2178, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2753, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2123, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2084, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2765, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2378, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2818, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2875, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1965, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1833, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2845, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2564, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3671, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.5016, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2364, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3019, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2878, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1726, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2592, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2434, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1953, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1803, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2206, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1811, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1831, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1947, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1751, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1799, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2442, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1731, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2304, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1909, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3223, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2353, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2460, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2144, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2569, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1888, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2283, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2671, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2426, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2019, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2183, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2377, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2245, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2007, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2137, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1549, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2278, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1609, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2277, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2324, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1440, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3219, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2729, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2188, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1820, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1598, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1597, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2891, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1972, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2913, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3152, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1635, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2350, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2302, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2612, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3154, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3562, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1970, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1969, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2172, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2293, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2642, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2202, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2272, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2306, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2765, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2169, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2202, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3495, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2326, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2206, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2606, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1429, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1786, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2329, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2661, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2147, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1869, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1816, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1961, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1911, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2328, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2271, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2489, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2657, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2622, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1707, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1995, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1602, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2147, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1625, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1964, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1678, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2222, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2710, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2667, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1630, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2282, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2466, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1856, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2999, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1581, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2195, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1752, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1885, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2013, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3104, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2256, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2024, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1327, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2994, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1562, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2098, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1566, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1648, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1946, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2547, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1809, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2584, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1784, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1296, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2208, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2173, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1546, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2216, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2971, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1986, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2129, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1886, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1618, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1765, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2114, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1369, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1922, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2083, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2097, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1946, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1652, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2554, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1753, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2174, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1653, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2261, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2286, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2137, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1550, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2217, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1845, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1864, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2408, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1966, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2138, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1243, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2000, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1821, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1935, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1617, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1861, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2205, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2108, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2790, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1901, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2009, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1841, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1190, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1844, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1311, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2491, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2214, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1358, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1491, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1976, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2338, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2431, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1914, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1842, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1639, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1697, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1766, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1933, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2686, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1561, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1751, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1636, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1323, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1661, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1649, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1793, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1890, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1795, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2885, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2258, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1570, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2011, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1937, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1672, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1987, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1302, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1717, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2401, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1336, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2364, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2482, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1562, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1927, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2518, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1395, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3256, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1683, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2019, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1515, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1334, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1447, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2338, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1799, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1867, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1690, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1914, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2145, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2091, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0908, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1176, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2226, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2156, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1194, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2269, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1934, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1335, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2192, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1456, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2418, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2118, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0996, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2007, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1644, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2184, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0975, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0960, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1661, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1849, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1345, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1166, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1367, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1854, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2301, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1730, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2093, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1135, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1817, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2413, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2181, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2296, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1947, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1653, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1188, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1950, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2131, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1895, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1532, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2142, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1662, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1232, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1536, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1938, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2222, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1830, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2234, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1223, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1124, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1966, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1562, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2596, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1677, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2244, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2796, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2552, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1774, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1778, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2193, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1081, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1677, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1485, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1345, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2578, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1976, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1647, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1217, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1802, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1983, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3138, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1258, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2210, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1198, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2408, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2637, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2354, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1232, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1620, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2434, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1850, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1197, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2494, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1803, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1395, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1704, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1742, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0778, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2792, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1629, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2321, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2251, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2605, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1867, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2186, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1340, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1684, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1700, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1521, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2487, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1648, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1828, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1243, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1211, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1016, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1846, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1381, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1842, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1582, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1434, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1381, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1202, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1533, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1684, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1900, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2199, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2575, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2325, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1514, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2584, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1595, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1985, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1922, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1715, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2649, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1421, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2156, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1523, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1755, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1512, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0936, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1449, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1611, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1389, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1591, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1932, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1528, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1695, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1554, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1216, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0812, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1676, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1522, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1427, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1897, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1693, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1265, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1402, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1499, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1717, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1238, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0928, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1938, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1566, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0877, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0852, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1253, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1253, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1295, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1769, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2586, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2009, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1827, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2139, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1830, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2184, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1576, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2129, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1839, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1616, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1859, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2323, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1171, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2427, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1956, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1389, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1138, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2225, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1407, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1735, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1609, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1509, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2261, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1005, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1620, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1476, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1957, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1433, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1184, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1560, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2081, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2343, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1491, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1486, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1590, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1645, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1608, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2879, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2534, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1352, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1419, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1773, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1480, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1865, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1404, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2270, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1012, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1461, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0845, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2006, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1098, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1408, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0855, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1908, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2416, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1281, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1612, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1081, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1293, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1001, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1394, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1867, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1766, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0920, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1125, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1977, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1335, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1329, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1399, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1090, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1743, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1327, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1598, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1767, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1312, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1373, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2264, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1968, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1312, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1650, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1704, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0881, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1964, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1459, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1150, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1154, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1738, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1164, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1554, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1082, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1510, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1473, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0808, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1477, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0664, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1789, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1213, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1387, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1153, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1182, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1638, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1308, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1592, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1402, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1857, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0797, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1024, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1461, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1348, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1110, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1281, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1526, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1958, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0991, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1260, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1889, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1021, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1322, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1180, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1205, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1092, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0917, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1376, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1254, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1715, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1575, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1621, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1371, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1564, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1395, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1402, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1276, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0544, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2279, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1226, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1686, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1446, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0951, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1088, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0981, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1094, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1966, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1252, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1235, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1272, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1317, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0996, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1963, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1339, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1741, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1472, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1192, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1623, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1702, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1835, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1138, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1336, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1824, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1204, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1209, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1111, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2378, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1960, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0834, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2383, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1004, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1471, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1666, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0763, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1484, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1502, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1345, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1477, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1219, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1104, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1171, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1700, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1461, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1706, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1181, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1676, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1418, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0950, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1393, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1359, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1134, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0682, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2297, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2115, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1271, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0938, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2318, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1604, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1845, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0928, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1439, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1240, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1301, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1616, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0675, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1854, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0960, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0548, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1743, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1103, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1401, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1189, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1246, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0907, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1481, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1348, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1452, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2151, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1696, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1783, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1707, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1801, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1538, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0876, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1119, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1627, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1117, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1218, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1248, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0997, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1602, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1155, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1693, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0591, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1936, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1406, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1786, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1842, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1325, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1639, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1194, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2145, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0917, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1297, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1711, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1109, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1653, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1397, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0590, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1991, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1886, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1570, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1396, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1936, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1982, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0917, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2314, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1324, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1861, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1595, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1482, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1479, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1244, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2006, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1378, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1349, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2111, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1147, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1254, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1438, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1531, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0734, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1198, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1334, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1419, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1127, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2094, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1413, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1998, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0870, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1717, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1517, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1398, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1887, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1005, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2334, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1321, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1966, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1196, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0923, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1943, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1254, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1080, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1713, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1132, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1432, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1402, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1366, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2369, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1851, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1590, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1541, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1366, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2092, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1441, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0892, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1013, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1143, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1579, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2017, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1189, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1961, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1154, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1418, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1142, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0904, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1637, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1925, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1263, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1797, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1119, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1608, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1160, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0763, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0562, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1373, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0874, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0607, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0979, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1170, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1507, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1204, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1798, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1120, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1567, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1146, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1401, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1247, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1429, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1351, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1327, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1138, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1266, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1668, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1207, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1256, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1013, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1221, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1087, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1823, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1399, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1631, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1858, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0940, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0986, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1431, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1013, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1433, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1254, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1746, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1668, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2186, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1563, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1184, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1177, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1826, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0966, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1166, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1395, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1858, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1211, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1379, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1277, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1116, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1539, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1256, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1594, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1457, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1761, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1217, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1285, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1715, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1511, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1433, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1106, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1561, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1198, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1458, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1473, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1261, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1457, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1323, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1324, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0537, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1494, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1256, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1636, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1437, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0839, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1129, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1306, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1183, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1543, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1947, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0817, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0776, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1186, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1555, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1564, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1104, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1191, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1769, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1597, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1527, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1279, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1196, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1223, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1479, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0957, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1347, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1327, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1421, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1887, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1132, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1597, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1179, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1223, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0416, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1951, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1588, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1780, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1126, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1465, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1400, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1628, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1673, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0786, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1652, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1395, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1448, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1310, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1630, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0903, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1301, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1565, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1148, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1414, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1654, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1300, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0900, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1241, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0965, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1488, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0829, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1259, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1720, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0705, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1083, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1629, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1748, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1378, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1556, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1122, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1352, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1146, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2132, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1630, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1657, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1419, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1468, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1400, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1818, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1507, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0816, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1109, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1190, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1190, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0446, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0820, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0933, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1316, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1673, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1385, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0931, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0750, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0817, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1507, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1617, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0813, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0944, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1166, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1636, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0911, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0709, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0740, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1231, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1955, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1287, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0935, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1615, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0980, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0916, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1342, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1229, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1201, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1342, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1215, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0726, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1584, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1228, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0920, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0935, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0980, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1632, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0962, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1483, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1830, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0510, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1346, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0651, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1601, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1170, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1766, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1841, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1339, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0889, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2432, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1327, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1397, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1010, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1658, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0930, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1708, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0573, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1148, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1109, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1325, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0959, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1651, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0799, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1557, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1126, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1809, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1578, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0866, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1821, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2175, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0886, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1300, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1022, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1646, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1285, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1268, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1479, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0433, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1473, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1802, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1437, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0920, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1376, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1600, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0805, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1450, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0885, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1382, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1292, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1164, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1559, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1479, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1170, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1255, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1322, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1763, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1338, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1458, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0778, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1434, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0478, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2086, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0916, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1383, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1483, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1638, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1269, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1320, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1659, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1630, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2080, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1545, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1372, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1083, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0708, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1137, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1432, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1242, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1873, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1683, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1285, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1531, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1296, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0613, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0896, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1484, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1428, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0899, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1377, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1426, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1586, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1130, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0642, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1144, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1477, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1252, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2683, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1512, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1013, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1611, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1118, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1814, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1390, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1443, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1520, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1091, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1406, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1400, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0739, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1686, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1415, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1143, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1629, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0883, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1084, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1099, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0901, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1457, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1575, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0356, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1007, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0417, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1184, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0999, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1300, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0961, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1485, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0608, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1497, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1236, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0464, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1435, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1343, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1142, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1719, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0743, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0848, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1320, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1118, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1721, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1784, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1111, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1457, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1426, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1463, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0785, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1339, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0968, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0401, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1000, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1601, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0324, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1319, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0476, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0835, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1111, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1402, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1249, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1307, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1216, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1400, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1437, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0657, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1440, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1577, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0829, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1626, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0697, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1200, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1852, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0410, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1618, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1236, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1302, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0950, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1149, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1436, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0922, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1190, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1360, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0686, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0950, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1224, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1015, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1741, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1404, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1495, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1879, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0685, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1519, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1369, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1234, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0853, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1504, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1180, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0978, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1459, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1263, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1409, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0908, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1170, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0973, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1230, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1130, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0758, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1462, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1213, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0942, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1527, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1181, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1626, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1433, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1535, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1525, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1639, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0885, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1263, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1368, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1601, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0970, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0856, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1098, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1563, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1426, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1269, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1123, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1804, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1201, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1092, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1687, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1155, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1170, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1333, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2130, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1116, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1183, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1634, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1112, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1286, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1826, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1299, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0967, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0833, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1886, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1615, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1474, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1278, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1494, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1461, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1506, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1178, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1779, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1817, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1424, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1844, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1878, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0913, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1097, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1383, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1237, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1395, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1400, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0746, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1567, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1250, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1105, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1767, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0975, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1229, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1242, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0933, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1438, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1790, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0974, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0974, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1866, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0904, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1021, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1876, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2112, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0834, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1777, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1854, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1109, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1546, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0879, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1226, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0969, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1793, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1309, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1369, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1484, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1231, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1371, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1200, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1442, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1840, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1374, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1465, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1804, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1933, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1372, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1209, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1412, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1434, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1399, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1522, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1409, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0737, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1634, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0546, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1116, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0919, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1600, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1221, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0707, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0917, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1288, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0657, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1227, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1630, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1176, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1113, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1005, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1925, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1661, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1428, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1427, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1097, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0822, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1362, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1173, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0925, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1451, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1369, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1451, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1153, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1626, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1175, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1705, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0971, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1109, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1128, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1394, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0796, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1205, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1446, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1470, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0953, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1265, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1704, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1692, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0892, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1387, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1573, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1590, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1405, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1435, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1330, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1177, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1214, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0889, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1148, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1344, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0904, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0958, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1134, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1519, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1227, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1475, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1610, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1240, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0758, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1694, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1459, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1431, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1212, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1657, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1231, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1430, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1115, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1470, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1537, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0707, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1129, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1791, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1403, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1314, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1252, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0838, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0913, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0882, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1550, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1741, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1961, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0737, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1375, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1291, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1311, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0652, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1116, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0794, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0714, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1139, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1268, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1490, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1117, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0953, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1610, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0940, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1290, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1737, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0913, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0703, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1303, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0983, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0864, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1372, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1428, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1204, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0944, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1179, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0960, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1618, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1276, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0934, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1189, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0857, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0888, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1286, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0976, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1109, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2243, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1413, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0994, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0965, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1586, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1308, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1254, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1357, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1630, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1665, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1248, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1230, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0953, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1657, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2090, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1218, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1682, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0801, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0822, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0716, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1378, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1248, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1582, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1452, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1199, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0901, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0789, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1084, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1556, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1188, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1576, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1191, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1710, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1491, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1754, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1284, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1322, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0798, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1255, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1459, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0706, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1232, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1491, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1119, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1214, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0934, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1253, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1389, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1135, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0806, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1364, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1112, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1302, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1418, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1436, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1257, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0944, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1955, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0739, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1238, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1626, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1333, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0745, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1313, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1548, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1240, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1272, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0764, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1252, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1265, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1371, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1083, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1306, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1756, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1278, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1383, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1294, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1291, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1224, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1111, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1258, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0791, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1322, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1954, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1098, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1427, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1182, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1357, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1132, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1119, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1093, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1238, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0962, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0883, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0773, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0910, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1261, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1713, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0884, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0681, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0789, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.4650, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1330, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0846, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1165, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1304, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1118, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1214, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1118, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0826, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0947, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0973, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1162, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1140, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1016, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1252, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1108, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0810, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0915, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1110, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0926, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0758, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1566, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0905, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0706, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1722, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1189, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1108, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0831, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1082, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1354, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1130, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0944, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0996, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1479, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1319, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0910, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0996, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1506, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0791, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1270, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1427, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1281, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1236, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1449, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1179, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0862, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1265, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1274, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0981, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1582, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1183, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1000, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0918, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1103, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1193, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1267, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1362, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1162, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1278, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1393, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1200, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1395, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1195, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0979, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0941, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1208, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0938, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1507, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1174, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0775, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1697, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0924, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1081, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1361, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1183, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1158, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1565, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0775, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1547, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0849, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1338, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0709, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1132, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1302, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1405, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0852, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1117, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1298, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1001, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1335, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1593, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1162, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1479, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1248, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1134, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1375, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0948, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1542, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0953, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2006, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1391, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1265, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1456, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1126, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1230, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0787, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1191, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1123, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0930, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1147, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1574, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0667, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1323, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0995, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1278, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1928, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0698, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1144, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1404, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1193, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1161, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0668, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1598, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1208, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1212, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1213, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1411, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0771, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1745, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1126, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0969, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0897, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1647, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1160, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0985, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0953, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1328, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1308, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0879, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1175, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1740, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1281, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1756, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1151, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1460, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1507, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0979, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1196, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0916, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1160, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0978, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0980, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0979, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1299, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1952, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0891, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0933, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1474, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1196, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0911, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1174, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1515, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1122, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1099, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1244, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0967, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0893, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1006, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0988, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0918, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0973, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1355, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1649, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1022, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1093, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1385, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0876, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1252, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0956, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1193, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0553, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0673, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1398, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0875, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1106, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0845, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1807, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1235, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1188, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0909, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0877, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1249, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1104, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1570, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1286, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1271, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1176, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0975, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1023, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0924, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1190, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1175, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1261, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0991, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0783, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1190, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1196, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0788, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1355, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1117, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1200, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1418, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1275, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1132, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0874, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0704, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1119, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1006, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1394, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0783, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1408, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0917, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1210, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1178, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1298, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1480, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0722, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0995, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0789, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0924, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0881, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0771, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1081, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1004, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1083, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0941, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0921, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1321, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1582, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1148, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1014, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1171, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2099, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0658, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1167, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1417, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3510, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1197, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1769, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1564, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0783, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0876, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0766, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0664, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0710, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1641, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1445, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0820, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1669, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1132, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1018, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1239, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1215, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0858, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0930, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1525, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0849, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0697, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1599, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0707, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0852, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1238, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1651, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1435, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1362, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0871, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0876, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0954, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1019, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0832, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1006, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1242, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0976, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1560, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1219, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1497, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0532, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0624, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0854, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1493, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1645, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1152, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0929, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1407, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0894, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1252, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1447, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1345, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0820, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1421, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1282, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0881, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0802, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1411, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0873, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1244, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1147, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0754, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1480, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1279, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1198, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1158, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1168, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0990, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0769, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0755, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1251, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1588, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0860, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0921, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1207, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0971, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0850, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1153, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1236, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1112, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0856, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0727, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1017, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0827, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1687, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1336, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1546, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1018, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1006, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1482, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1342, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1477, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1080, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0903, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1119, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1364, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1111, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0872, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1144, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1242, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0749, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1188, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1124, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1922, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1023, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1440, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1141, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1331, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0941, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1488, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1380, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1007, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1631, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2017, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0757, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1562, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1397, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1677, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1174, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1469, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1304, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0911, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0791, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1277, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1337, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1255, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0792, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0884, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1189, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1202, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0972, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1414, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1145, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0865, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1143, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0985, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0709, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0921, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1098, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1219, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3861, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0967, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0956, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1798, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0990, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1198, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1127, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0946, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1690, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0774, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1114, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1539, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0948, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1348, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0927, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1204, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1320, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1161, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1286, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1394, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1251, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1120, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1682, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0729, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1124, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1438, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0889, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1003, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1014, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1113, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1620, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1013, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0958, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1101, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0895, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0682, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0904, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0750, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0654, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1129, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0749, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0891, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1154, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0740, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1082, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1572, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1183, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0592, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1254, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0941, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1159, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0789, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0862, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1087, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0886, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1568, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0696, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1194, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1191, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1000, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1761, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0650, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0812, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1306, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1276, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1568, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1298, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0917, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1160, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1167, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0805, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0833, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0990, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0918, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1656, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0735, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1134, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1479, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0904, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1143, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1193, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1113, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0921, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0879, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1227, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1200, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1157, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1307, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0875, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0743, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1504, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1207, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1021, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0775, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0819, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1388, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0803, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0992, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1375, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1205, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0733, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0924, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0802, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1080, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0745, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0656, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1101, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0909, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0694, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1085, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1180, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0983, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1276, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1294, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1127, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0999, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0840, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0900, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0944, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1202, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0873, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1294, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1114, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0873, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0985, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1254, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1362, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1141, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0851, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1083, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0515, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0473, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1114, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1251, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0829, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1471, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0689, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1523, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1282, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1470, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1200, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1339, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1152, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1330, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0464, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0815, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1206, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1620, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1210, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0892, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0727, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1443, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1261, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0944, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1463, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1124, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1717, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0783, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0924, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1140, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1591, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0807, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0974, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0932, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1092, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1160, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1264, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1188, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0734, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0921, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0980, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1122, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0672, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0889, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1024, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1464, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0801, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1024, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0888, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0766, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1229, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0841, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1341, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1744, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0773, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1265, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0820, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0938, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0774, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1361, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0957, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1264, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1263, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0746, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1179, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0689, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1201, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0949, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1289, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1675, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0930, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0929, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1619, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0997, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0966, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1123, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0956, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0643, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0702, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1102, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0822, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0775, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0712, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1623, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1338, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0881, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0809, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1143, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1231, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0644, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0926, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0988, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0795, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1357, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1227, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0662, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1296, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1114, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0624, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0932, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0687, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1306, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1269, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0937, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1223, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0880, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0879, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0974, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0786, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1245, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0966, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1196, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0986, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0994, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1115, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1008, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1122, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1121, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1500, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1347, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0945, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0872, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0596, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0845, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1220, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1251, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0934, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1376, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0896, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0869, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0925, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0958, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0877, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0754, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0876, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0959, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0987, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0637, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1426, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0996, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0925, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0679, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0889, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1200, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1137, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1091, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1016, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0912, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0876, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0924, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0780, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1195, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0910, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1088, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0943, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0785, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0824, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1101, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0834, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0848, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1103, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1312, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0851, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0983, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1381, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1351, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0652, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1622, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1015, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1288, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0850, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2011, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0886, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0774, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0917, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1314, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1166, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1287, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1221, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0833, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0750, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1468, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1152, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0786, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1212, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0634, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0835, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1249, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0898, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0745, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0970, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0828, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0823, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0605, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0551, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1084, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0871, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1151, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0778, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0890, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0977, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1290, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1425, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0811, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0973, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0936, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0916, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0822, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1369, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1300, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1114, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0557, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0785, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0885, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0855, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0875, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1138, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0773, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1176, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1275, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0861, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0715, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0917, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0986, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0670, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0912, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1118, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0664, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0647, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1786, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1188, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0958, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0874, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0938, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1262, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1387, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1012, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0960, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0672, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1145, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1322, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1160, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1288, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1237, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1381, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1009, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0893, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0883, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0833, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1022, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0822, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0704, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1140, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0669, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1265, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0951, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0933, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0892, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1205, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0952, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0795, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1098, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0710, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0707, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1091, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0689, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0821, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0814, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0894, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0870, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0722, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0712, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0779, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1006, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0728, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0654, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1109, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0798, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0789, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1019, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0931, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1162, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0630, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0896, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0779, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1024, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0754, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1214, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0906, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1085, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0753, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0731, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0749, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0670, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0716, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0436, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0902, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0939, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0663, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1254, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0839, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0862, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1091, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0779, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0741, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0926, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0766, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0893, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0856, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0908, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0743, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0959, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1333, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0973, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0765, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0839, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0854, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0979, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0957, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1488, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1009, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0773, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0861, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1006, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0878, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1000, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1275, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0990, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1125, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0606, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0779, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0965, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0448, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1091, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1114, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0993, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1137, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0718, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0718, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0880, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1420, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0944, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0731, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1020, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1363, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0642, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1169, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1191, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0709, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1017, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0599, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0751, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1015, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0656, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0913, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0877, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0867, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0830, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0934, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0667, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1152, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0950, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0884, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0966, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0841, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0682, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0834, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1115, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0742, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0777, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1640, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0810, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0661, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0903, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0853, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0850, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1152, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0783, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0769, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1233, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1255, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0994, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1254, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0935, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0977, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1089, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0941, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1309, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0772, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0885, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1241, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0636, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1590, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0882, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1108, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0710, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0782, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1188, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0834, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0694, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0748, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0893, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0838, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0932, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1358, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0901, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0790, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1153, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0816, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0659, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1010, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0938, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1382, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1260, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0539, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0984, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0867, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0791, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0991, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0910, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1168, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0896, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0692, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0860, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1011, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0693, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1096, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0942, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0954, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1021, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0916, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1274, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0681, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0903, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0510, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0950, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0762, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0440, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1165, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1152, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1011, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0995, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0616, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0650, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0936, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1281, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0755, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0628, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0837, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0646, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0877, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0738, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0932, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0878, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1512, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1136, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1082, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0756, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0935, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0876, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1139, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1260, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1249, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0775, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0862, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0792, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0914, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1124, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1254, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0897, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0917, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0995, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0790, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1145, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0699, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1171, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0914, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0800, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1356, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0806, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1492, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1474, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0516, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1251, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0803, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0646, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1005, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0875, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1384, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0982, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0754, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0914, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0725, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0910, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0883, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0609, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0756, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0817, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0543, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1683, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0832, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0935, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0470, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0508, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0862, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1013, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0771, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0934, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0929, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0895, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0392, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1438, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0795, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1264, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0636, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0702, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1234, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0745, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0807, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0616, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0788, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1179, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0742, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1096, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0821, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0926, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1454, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0867, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0737, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0496, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0909, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0985, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0664, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0733, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0910, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1007, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0842, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1146, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0929, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0722, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0699, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1415, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0972, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0700, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0905, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0947, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0619, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0700, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0956, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0828, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0995, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0867, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0696, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0553, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0509, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0763, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0694, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0856, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1087, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1308, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0826, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0476, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0569, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1093, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0679, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0675, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0933, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1437, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0817, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0888, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0718, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0840, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0798, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0645, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1193, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0648, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0448, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0889, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0785, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1091, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1219, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0812, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0924, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0658, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0827, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0978, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0945, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0702, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0703, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0983, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0988, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0861, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0999, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0711, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0788, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1183, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1162, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0931, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0936, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0873, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0662, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0960, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0909, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0870, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0740, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1143, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0544, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1231, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0860, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0544, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0820, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0553, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1007, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0726, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0879, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0731, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1158, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0975, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0834, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0670, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1417, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0899, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0903, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0718, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0772, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0975, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0556, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1326, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1013, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0689, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0894, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0511, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0846, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0772, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0572, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0977, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0896, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1420, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0796, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0688, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0661, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0721, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0604, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0870, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0949, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0875, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0762, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1009, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0783, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0681, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1127, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0883, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0755, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0674, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0736, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1098, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0644, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0727, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0879, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0711, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1518, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0796, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0898, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0646, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0800, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0782, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0877, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0851, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0633, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0546, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0770, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0968, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0440, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0490, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0776, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0849, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0953, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0814, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0607, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0918, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1085, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0567, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0736, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0730, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0937, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0962, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0824, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0643, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0546, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0798, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0896, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0474, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0923, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0897, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0913, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0919, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0663, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0905, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0635, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0743, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0800, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1097, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0554, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0687, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1187, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0792, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0874, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0847, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0335, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0537, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0760, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0665, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0948, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0759, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1006, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1008, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0573, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0767, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0786, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0926, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0842, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0795, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0589, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0793, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0848, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0855, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0681, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0964, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1190, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0794, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0749, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1091, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0818, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0696, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0764, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0695, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0693, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0627, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0631, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0690, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0948, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0801, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0757, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1306, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0859, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0740, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0585, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0707, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1145, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0645, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2348, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0829, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0957, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0935, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0679, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0758, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0855, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1117, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1149, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0434, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0837, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0831, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0856, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0465, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0874, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0971, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0773, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0577, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1105, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0669, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1022, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0844, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1146, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0713, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0816, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0652, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0549, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0816, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0879, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0840, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0861, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0706, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0552, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0797, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0851, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0776, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1098, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1097, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0805, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0544, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0572, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0672, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0952, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0562, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0846, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0765, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0649, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0497, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0867, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0980, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0860, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0936, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0958, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0872, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0929, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0892, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0744, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0915, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0747, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0607, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0608, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1086, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0909, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0615, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0956, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0598, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0589, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0584, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0533, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0609, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0659, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0813, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0639, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0961, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0471, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0964, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0772, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0525, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0709, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0416, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0739, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0751, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0752, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0558, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0969, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0829, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0761, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0748, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0852, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0502, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0620, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0837, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0820, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0682, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0880, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0767, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0853, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0753, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0539, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0750, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0441, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1244, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0662, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0985, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0926, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0848, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0797, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0541, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0977, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0669, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0473, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0666, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0901, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0785, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0651, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0743, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1098, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0623, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0858, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0950, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0928, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0964, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0899, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0612, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0625, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0835, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0751, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0959, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0798, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0745, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1118, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1002, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0576, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0973, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0637, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0682, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0712, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0602, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0809, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1141, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0739, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0662, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0719, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0739, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0904, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0875, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0768, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0900, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0692, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0759, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0363, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0728, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0986, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0758, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0858, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1423, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0687, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0500, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0493, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0765, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0717, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0978, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1195, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0679, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0651, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0666, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0618, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0738, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0734, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0920, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0870, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0759, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0780, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0828, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0672, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0578, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0542, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0876, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0736, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0751, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0807, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0837, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0641, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0866, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0822, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0512, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0823, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0835, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1102, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0633, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0974, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0633, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0625, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0960, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0964, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1320, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0640, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0672, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0619, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0440, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0842, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0625, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0778, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0389, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0688, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1146, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0705, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0454, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0756, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0878, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0780, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0845, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0807, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0638, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0717, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0831, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0784, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0719, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0910, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0603, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0524, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0700, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0492, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0750, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0743, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0604, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0832, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0789, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0796, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0742, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0869, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0507, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0499, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0749, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0539, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0640, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0469, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0744, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0654, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0510, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0778, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0791, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0730, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0825, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0679, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0719, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0610, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0648, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0494, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0741, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0721, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0604, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0914, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0594, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0882, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0721, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0964, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0507, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0779, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0257, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0874, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0523, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0546, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0431, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0648, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0844, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0585, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0673, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0841, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0762, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0895, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0875, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0959, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0638, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0947, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0853, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0663, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0693, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0411, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0758, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0715, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1172, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0697, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0660, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0823, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0676, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0589, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0710, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0599, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0628, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0500, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0943, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0648, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0547, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0771, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0565, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0493, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0631, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1142, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0609, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0530, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0822, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0710, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0817, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0989, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0541, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0713, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0634, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0679, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0660, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0667, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0640, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0982, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0971, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0858, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1011, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0783, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0903, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0722, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0805, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0487, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0684, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0846, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0897, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0798, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0660, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0571, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0946, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0619, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0824, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1100, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0644, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0531, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0638, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0734, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0607, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1005, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0960, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0561, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0806, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0712, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0850, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0843, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0573, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0680, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0653, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0709, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0822, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0499, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0758, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0631, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0573, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0987, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0741, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0850, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0855, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0642, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0675, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0447, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0413, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0495, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1010, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0518, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0468, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0750, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0704, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0676, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1108, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0507, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0629, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0611, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0578, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0832, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0819, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0715, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0712, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0734, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0872, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0727, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0565, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1001, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0563, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0649, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0482, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0632, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0610, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0660, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0707, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0714, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0659, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0799, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0576, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0911, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0691, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0652, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0651, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0639, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0813, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0505, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0592, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0387, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0610, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0793, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0791, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0870, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0709, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0797, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0571, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0673, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0674, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0806, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0641, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0709, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0714, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0673, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0613, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0530, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0474, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0725, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0823, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1005, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1187, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0350, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0783, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0467, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0908, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0816, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0737, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0654, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0553, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0690, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0795, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0727, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0697, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0376, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0532, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0395, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0702, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0628, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0516, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0281, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0600, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0707, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0822, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0572, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0495, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0542, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1146, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0733, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0591, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0768, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0707, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0667, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0828, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0532, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0462, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0586, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0845, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0644, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0762, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0951, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0543, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0743, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0854, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0545, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0530, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0617, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0616, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0332, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0536, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0828, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0521, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0402, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0508, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0472, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0665, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0788, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0437, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0689, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0755, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0733, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0810, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0613, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0753, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0902, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0869, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0769, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0814, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0607, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0516, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0677, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0665, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0592, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1006, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0580, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0717, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0421, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0632, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0662, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0718, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0707, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0790, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0442, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0832, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0649, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0458, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0596, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0658, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0598, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0651, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0441, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0739, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0617, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0753, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0447, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0551, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0518, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0525, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0683, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0525, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0575, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0949, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0580, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0829, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0362, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0736, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0492, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0658, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0435, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0876, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0468, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0545, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0581, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0561, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0624, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0480, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0831, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0674, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0737, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0419, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0666, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0486, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0460, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0628, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0538, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0786, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0626, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0475, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0491, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0428, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0623, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0713, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0547, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0903, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0422, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0603, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0625, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0879, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0676, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0517, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0823, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0481, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0699, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0482, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0598, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0646, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0674, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0759, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0515, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0840, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0671, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0786, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0865, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0606, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0646, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0734, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1008, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0536, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0568, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0629, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0993, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0496, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0960, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0804, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0810, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0647, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0618, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0793, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0564, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0494, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0686, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0605, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0656, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0472, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0458, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0543, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0646, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0570, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0730, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0754, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0613, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0536, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0586, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0932, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0408, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0597, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0745, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0357, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0688, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0511, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0843, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0510, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0508, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0756, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0757, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0914, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0545, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0728, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0398, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0634, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0418, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0853, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0570, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0693, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0583, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0670, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0593, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0888, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0824, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0578, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0531, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0499, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0779, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0500, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0898, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0695, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0663, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0222, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0555, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0510, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0579, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0517, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0573, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0611, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0463, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0756, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0549, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0518, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0376, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0476, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0772, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0584, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0421, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2237, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0604, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0934, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0668, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0702, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0655, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0763, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0535, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0355, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0887, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0741, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0835, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0612, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0559, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0615, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0660, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0533, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0690, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0894, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0652, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0647, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0820, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0652, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0672, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0605, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0554, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0666, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0806, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0633, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0605, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0614, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0404, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0550, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0523, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0529, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0908, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0535, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0439, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0999, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0750, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0568, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0522, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0777, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0726, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0560, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0697, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0788, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0474, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0782, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0571, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0765, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0728, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0968, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0610, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0829, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0416, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0544, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0480, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0525, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0686, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0813, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0818, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0563, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0558, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0405, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0870, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0530, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0704, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0464, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0395, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0764, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0783, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0741, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0598, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0550, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0665, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0678, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0700, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0778, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0485, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0665, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0602, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0761, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0674, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0714, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0508, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0681, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0436, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0590, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0369, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0438, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0690, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0406, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0693, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0686, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0496, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0556, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0529, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0817, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0975, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0590, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0463, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0666, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0521, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0579, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0411, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0573, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0554, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0428, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0463, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0759, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0763, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0399, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0558, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0267, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0628, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0698, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0794, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0762, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0414, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0403, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0725, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0703, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0524, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0467, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0491, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0565, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0660, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0593, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0753, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0630, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0898, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0330, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0705, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0404, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0385, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0395, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0852, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0585, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0530, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0665, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0503, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0524, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0674, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0635, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0715, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0578, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0472, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0661, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0396, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0636, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0555, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0635, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0665, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0649, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0545, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0561, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0841, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0686, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0803, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0521, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0586, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0605, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0530, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0463, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0594, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1259, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0463, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0205, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0722, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0786, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0519, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0419, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0647, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0523, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0592, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0392, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0563, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0963, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0768, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0576, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0470, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0424, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0506, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0712, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0557, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0550, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0852, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0749, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0488, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0527, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0633, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0708, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0445, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0635, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0380, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0427, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0739, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0614, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0651, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0640, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0239, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0191, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0801, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0406, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0629, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0670, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0509, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0664, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0583, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0555, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0687, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0568, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0445, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0484, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0487, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0713, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0707, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0522, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0676, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0532, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0437, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0893, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0357, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0476, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0479, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0697, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0291, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0401, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0672, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0656, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0750, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0580, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0673, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0322, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0613, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0693, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0819, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0488, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0669, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0596, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0590, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0459, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0693, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0405, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0646, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0623, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0579, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0496, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0805, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0760, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0592, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1833, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0662, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0705, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0688, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0943, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0425, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0445, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0654, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0355, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0475, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0557, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0381, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0598, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0537, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0506, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0581, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0574, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0564, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0434, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0445, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0682, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0641, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0690, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0489, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0555, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0755, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0610, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0523, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0642, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0522, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0512, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0869, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0719, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0471, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0614, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0730, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0397, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0535, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0734, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0629, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0462, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0813, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0719, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0875, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0699, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0480, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0752, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0529, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0773, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0547, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0593, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0417, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0576, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0556, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0632, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0421, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0712, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0541, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0453, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0381, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0499, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0580, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0639, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0506, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0656, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0560, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0573, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0494, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0554, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0171, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0635, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0516, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0639, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0666, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0497, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0533, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0514, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0595, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0459, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0931, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0528, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0724, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0611, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0605, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0732, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0333, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0687, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0747, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0539, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0350, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0747, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0505, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0542, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0321, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0598, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0517, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0492, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0433, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0579, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0628, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0545, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0643, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0436, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0755, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0588, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0692, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0186, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0374, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0585, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0458, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0608, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0801, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0518, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0365, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0857, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0449, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0530, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0891, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0664, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0590, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0568, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0463, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0470, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0363, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0663, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0660, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0836, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0663, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0349, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0416, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0354, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0839, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0580, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0671, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0184, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0282, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0735, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0746, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0748, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0545, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0464, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0475, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0590, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0534, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0536, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0438, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0528, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0466, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0534, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0559, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0681, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0373, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0324, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0596, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0575, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0662, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0579, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0459, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0316, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0832, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0585, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0159, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0591, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0758, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0494, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0505, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0621, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0648, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0526, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0542, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0420, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0814, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0556, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0435, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0336, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0168, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0340, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0498, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0526, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0427, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0598, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0535, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0619, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0394, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0511, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0474, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0538, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0616, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0431, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0730, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0881, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0425, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0141, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0477, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0804, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0573, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0585, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0764, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0487, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0540, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0529, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0624, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0507, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0391, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0448, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0486, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0537, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0652, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0441, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0361, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0808, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0377, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0307, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0566, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0638, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0502, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0773, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0411, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0598, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0443, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0393, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0694, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0381, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0695, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0382, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0713, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0451, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0579, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0509, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0486, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0756, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0466, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0659, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0580, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0407, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0537, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0446, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0360, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0545, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0525, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0528, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0859, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0627, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0551, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0609, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0585, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0544, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0448, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0231, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0571, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0574, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0685, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0773, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0510, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0449, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0471, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0855, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0535, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0796, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0726, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0822, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0633, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0728, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0654, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0302, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0441, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0415, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0489, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0378, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0565, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0615, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0436, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0335, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0560, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0682, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0609, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0686, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0718, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0516, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0507, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0587, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0526, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0366, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0528, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0472, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0677, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0360, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0655, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0863, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0571, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0484, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0561, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0920, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0542, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0439, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0458, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0474, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0397, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0396, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0716, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0474, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0405, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0581, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0639, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0477, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0480, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0727, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0811, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0319, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0851, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0564, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0464, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0427, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0591, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0516, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0528, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0478, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0521, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0645, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0358, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0650, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0732, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0330, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0537, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0539, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0480, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0753, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0378, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0684, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0530, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0459, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0156, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0355, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0653, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0409, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0704, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0515, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0460, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0653, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0803, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0628, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0769, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0531, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0426, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0558, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0482, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0466, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0427, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0622, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0476, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0555, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0386, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0437, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0533, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0579, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0434, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0462, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0452, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0765, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0545, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0307, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0594, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0559, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0860, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0488, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0340, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0399, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0472, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0843, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0438, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0311, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0540, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0444, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0546, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0575, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0520, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0309, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0536, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0473, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0404, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0594, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0488, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0631, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0434, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0543, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0608, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0544, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0625, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0537, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0544, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0597, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0454, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0610, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0652, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0716, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0370, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0292, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0581, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0617, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0498, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0481, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0853, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0608, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0500, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0471, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0610, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0970, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0572, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0541, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0408, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0455, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0737, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0659, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0507, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0486, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0470, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0355, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0554, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0500, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0510, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0392, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0601, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0360, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0605, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0272, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0480, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0705, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0554, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0711, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0366, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0301, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0386, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0520, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0487, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0613, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0577, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0473, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0717, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0756, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0428, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0453, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0328, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0569, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0683, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0595, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0551, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0509, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0546, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0458, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0533, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0382, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0385, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0701, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0430, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0567, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0722, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0439, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0351, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0518, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0658, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0689, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0589, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0533, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0650, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0406, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0430, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0614, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0556, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0480, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0508, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0618, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0489, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0415, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0400, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0883, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0675, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0764, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0488, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0404, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0743, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0495, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1017, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0456, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0583, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0371, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0614, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0532, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0442, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0409, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0545, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0667, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0392, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0572, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0426, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0603, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0688, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0641, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0499, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0517, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0424, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0431, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0796, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0748, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0708, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0672, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0551, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0420, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0432, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0510, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0571, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0466, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0482, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0428, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0532, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0386, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0672, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0691, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0470, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0451, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0646, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0565, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0425, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0565, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0618, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0747, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0467, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0433, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0465, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0408, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0561, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0471, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0625, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0486, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0583, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0415, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0678, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0336, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0419, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0654, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0528, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0356, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0376, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0374, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0597, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0627, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0439, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0394, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0339, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0624, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0396, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0573, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0385, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0575, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0505, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0464, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0562, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0647, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0475, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0515, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0621, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0501, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0470, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0894, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0653, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0613, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0635, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0553, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0570, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0584, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0493, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0467, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0394, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0638, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0619, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0404, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0528, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0395, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0583, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0599, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0551, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0685, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0352, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0650, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0718, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0460, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0530, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0550, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0545, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0727, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0618, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0438, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0361, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0493, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0371, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0442, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0537, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0479, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0656, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0443, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0452, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0560, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0427, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0648, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0516, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0469, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0413, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0577, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0475, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0519, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0410, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0764, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0451, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0542, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0544, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0457, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0635, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0686, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0497, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0473, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0680, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0516, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0616, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0472, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0386, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0520, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0476, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0549, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0592, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0382, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0401, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0589, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0318, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0525, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0491, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0503, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0499, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0487, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0381, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0544, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0798, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0529, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0581, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0516, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0427, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0434, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0378, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0358, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0463, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0514, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0581, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0619, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0497, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0458, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0368, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0448, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0458, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0527, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0444, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0390, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1473, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0673, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0545, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0464, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0610, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0498, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0579, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0563, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0469, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0600, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0610, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0385, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0628, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0443, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0585, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0350, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0469, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0396, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0469, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0550, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0506, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0589, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0705, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0696, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0593, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0480, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0637, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0446, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0604, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0443, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0339, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0723, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0511, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0549, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0653, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0341, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0529, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0549, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0438, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0385, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0541, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0459, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0450, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0614, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0498, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0587, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0552, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0669, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0491, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0381, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0454, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0563, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0461, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0468, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0544, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0578, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0556, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0547, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0586, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0652, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0539, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0533, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0664, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0500, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0436, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0511, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0543, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0485, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0446, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0489, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0672, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0619, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0603, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0740, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0600, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0427, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0650, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0520, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0483, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0539, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0661, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0389, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0504, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0517, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0516, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0514, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0476, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0491, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0498, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0566, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0569, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0601, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0425, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0451, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0577, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0591, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0420, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0411, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0476, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0620, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0356, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0428, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0412, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0551, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0492, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0368, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0394, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0565, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0410, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0365, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0641, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0536, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0403, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0680, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0624, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0533, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0502, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0346, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0514, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0458, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0644, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0566, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0650, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0507, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0527, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0580, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0417, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0539, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0496, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0576, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0711, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0449, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0501, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0464, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0363, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0472, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0615, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0469, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0467, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0388, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0468, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0601, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0627, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0584, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0452, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0554, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0555, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0455, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0707, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0519, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0752, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0330, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0423, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0566, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0467, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0488, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0394, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0530, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0645, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0541, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0574, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0609, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0434, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0507, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0418, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0355, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0429, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0332, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0383, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0514, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0383, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0518, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0471, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0482, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0670, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0704, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0491, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0533, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0556, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0540, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0420, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0472, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0420, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0440, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0381, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0456, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0406, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0491, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0519, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0447, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0553, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0578, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0621, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0502, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0613, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0404, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0426, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0628, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0451, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0388, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0451, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0431, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0444, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0584, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0445, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0525, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0463, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0515, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0343, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0528, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0550, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0601, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0562, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0464, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0388, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0497, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0475, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0353, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0530, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0470, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0396, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0551, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0521, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0510, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0376, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0531, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0428, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0454, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0350, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0480, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0594, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0495, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0440, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0352, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0495, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0451, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0453, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0518, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0615, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0490, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0477, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0560, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0452, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0691, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0475, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0372, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0541, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0498, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0450, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0346, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0539, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0530, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0378, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0575, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0580, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0551, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0642, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0467, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0384, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0350, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0504, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0403, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0513, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0582, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0390, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0413, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0505, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0505, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0335, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0397, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0392, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0431, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0651, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0472, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0366, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0527, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0445, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0467, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0480, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0445, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0623, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0398, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0506, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0555, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0477, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0420, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0578, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0488, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0486, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0459, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0525, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0565, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0425, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0582, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0523, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0579, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0523, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0411, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0297, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0448, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0461, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0501, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0583, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0471, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0440, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0503, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0605, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0583, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0459, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0496, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0469, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0677, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0421, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0371, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0593, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0540, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0395, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0635, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0421, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0740, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0445, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0387, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0454, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0335, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0522, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0389, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0380, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0532, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0612, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0347, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0441, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0312, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0604, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0443, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0496, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0380, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0627, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0678, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0744, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0655, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0412, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0504, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0561, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0636, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0423, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0588, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1860, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0584, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0255, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0582, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0538, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0540, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0508, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0384, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0456, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0489, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0515, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0595, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0556, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0367, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0505, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0552, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0651, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0554, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0499, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0505, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0664, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0396, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0520, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0475, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0466, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0405, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0313, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0354, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0534, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0519, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0575, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0498, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0506, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0589, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0426, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0413, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0530, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0494, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0610, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0378, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0262, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0529, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0449, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0447, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0421, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0516, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0509, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0467, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0573, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0355, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0424, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0243, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0537, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0745, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0330, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0576, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0406, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0463, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0549, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0583, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0441, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0644, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0541, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0541, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0515, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0468, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0427, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0564, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0329, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0519, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0401, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0449, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0495, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0412, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0495, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0784, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0430, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0445, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0372, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0414, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0379, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0679, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0494, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0491, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0315, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0515, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0352, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0470, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0490, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0476, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0586, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0415, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0345, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0357, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0644, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0449, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0534, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0535, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0543, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0432, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0435, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0563, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0507, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0365, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0414, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0377, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0528, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0454, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0729, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0518, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0363, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0453, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0544, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0544, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0489, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0627, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0516, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0613, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0403, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0518, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0488, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0518, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0365, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0507, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0512, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0329, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0382, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0657, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0354, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0435, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0366, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0412, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0350, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0400, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0405, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0487, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0262, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0385, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0405, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0426, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0302, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0445, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0647, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0546, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0493, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0461, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0412, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0374, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0349, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0388, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0491, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0479, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0506, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0363, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0574, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0417, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0573, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0355, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0388, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0506, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0477, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0546, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0426, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0399, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0565, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0359, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0353, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0434, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0454, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0378, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0476, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0617, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0470, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0549, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0368, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0458, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0253, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0507, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0425, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0691, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0578, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0294, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0393, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0353, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0318, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0552, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0489, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0474, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0453, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0524, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0443, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0535, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0445, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0563, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0330, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0388, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0428, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0677, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0492, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0476, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0525, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0565, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0440, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0391, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0297, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0540, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0396, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0360, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0408, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0518, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0504, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0366, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0567, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0404, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0444, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0440, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0237, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0275, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0363, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0427, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0471, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0678, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0549, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0388, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0428, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0327, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0646, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0507, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0655, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0323, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0443, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0502, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0542, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0273, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0426, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0363, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0544, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0603, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0502, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0317, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0686, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0399, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0456, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0395, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0632, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0482, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0345, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0499, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0283, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0336, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0379, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0457, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0557, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0388, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0494, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0392, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0360, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0386, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0404, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0490, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0436, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0541, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0425, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0457, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0563, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0481, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0479, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0473, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0381, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0442, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0453, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0479, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0619, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0482, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0536, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0553, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0427, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0541, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0339, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0550, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0381, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0381, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0429, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0433, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0384, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0469, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0296, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0306, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0458, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0708, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0608, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0528, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0525, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0358, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0454, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0416, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0599, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0411, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0393, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0422, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0316, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0298, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0498, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0599, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0563, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0539, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0439, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0530, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0532, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0424, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0294, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0345, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0505, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0385, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0438, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0416, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0315, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0447, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0442, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0513, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0446, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0375, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0564, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0420, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0528, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0625, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0394, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0534, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0354, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0558, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0378, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0514, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0398, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0496, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0351, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0330, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0437, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0399, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0510, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0493, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0470, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0527, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0593, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0317, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0445, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0413, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0375, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0386, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0482, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0633, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0432, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0377, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0550, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0475, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0453, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0355, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0485, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0586, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0352, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0476, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0494, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0385, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0427, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0479, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0567, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0380, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0287, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0569, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0358, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0411, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0366, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0330, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0341, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0383, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0432, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0521, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0487, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0343, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0458, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0445, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0500, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0436, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0431, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0394, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0406, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0482, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0501, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0542, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0487, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0425, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0494, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0412, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0614, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0436, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0457, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0519, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0558, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0579, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0435, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0410, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0452, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0448, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0618, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0672, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0525, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0460, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0413, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0356, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0570, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0439, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0324, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0403, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0393, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0476, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0431, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0401, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0346, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0569, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0341, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0548, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0349, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0386, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0341, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0338, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0296, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0515, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0293, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0440, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0563, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0379, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0440, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0604, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0267, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0391, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0330, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0423, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0497, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0474, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0603, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0487, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0457, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0405, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0357, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0597, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0542, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0345, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0362, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0679, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0363, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0402, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0840, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0495, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0400, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0361, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0513, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0463, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0356, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0381, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0486, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0336, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0461, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0280, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0515, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0671, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0407, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0492, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0530, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0399, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0440, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0481, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0587, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0405, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0439, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0404, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0381, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0378, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0428, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0580, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0546, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0420, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0350, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0447, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0574, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0508, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0527, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0544, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0485, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0526, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0363, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0378, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0618, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0268, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0284, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0413, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0331, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0369, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0413, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0327, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0335, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0468, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0301, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0481, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0327, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0588, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0486, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0526, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0556, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0412, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0455, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0371, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0548, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0477, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0382, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0409, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0534, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0448, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0371, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0261, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0519, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0349, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0494, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0521, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0432, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0401, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0445, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0391, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0599, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0422, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0517, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0354, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0334, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0405, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0389, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0293, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0292, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0352, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0357, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0542, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0385, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0476, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0390, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0351, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0409, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0347, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0377, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0308, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0366, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0386, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0432, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0397, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0449, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0385, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0595, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0541, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0398, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0359, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0314, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0513, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0692, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0458, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0308, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0507, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0358, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0315, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0469, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0465, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0418, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0311, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0425, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0367, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0402, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0438, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0426, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0364, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0476, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0649, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0332, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0502, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0512, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0672, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0403, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0531, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0302, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0424, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0411, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0399, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0479, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0424, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0522, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0266, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0396, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0644, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0378, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0401, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0443, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0589, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0461, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0546, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0350, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0421, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0371, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0531, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0436, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0328, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0389, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0478, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0379, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0371, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0368, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0259, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0385, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0286, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0502, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0387, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0470, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0498, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0459, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0440, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0354, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0313, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0466, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0313, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0268, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0297, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0565, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0426, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0334, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0562, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0560, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0448, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0405, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0409, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0185, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0472, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0461, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0418, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0493, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0543, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0483, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0242, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0442, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0357, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0369, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0543, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0382, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0541, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0578, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0675, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0594, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0372, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0501, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0588, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0481, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0512, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0500, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0581, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0437, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0435, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0394, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0481, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0464, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0610, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0463, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0653, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0346, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0441, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0507, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0368, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0614, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0428, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0429, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0299, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0351, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0515, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0475, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0427, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0424, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0341, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0491, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0667, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0508, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0436, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0422, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0536, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0362, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0307, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0485, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0342, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0589, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0418, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0450, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0455, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0520, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0374, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0640, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0476, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0348, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0427, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0500, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0369, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0360, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0518, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0469, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0415, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0492, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0352, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0326, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0465, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0540, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0471, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0329, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0341, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0306, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0320, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0484, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0529, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0402, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0344, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0498, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0617, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0355, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0655, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0538, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0330, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0382, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0452, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0384, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0450, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0391, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0501, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0458, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0485, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0501, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0475, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0338, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0419, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0562, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0553, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0436, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0426, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0403, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0470, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0414, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0558, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0282, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0281, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0474, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0502, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0357, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0441, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0529, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0467, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0393, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0338, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0412, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0408, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0397, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0269, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0422, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0501, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0304, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0539, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0495, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0414, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0459, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0352, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0553, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0587, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0304, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0357, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0519, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0442, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0483, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0524, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0236, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0328, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0343, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0452, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0326, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0424, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0447, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0463, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0607, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0384, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0686, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0215, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0456, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0324, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0346, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0646, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0685, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0504, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0340, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0558, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0395, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0511, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0405, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0394, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0417, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0456, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0482, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0318, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0437, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0443, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0446, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0382, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0258, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0195, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0411, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0410, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0418, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0465, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0508, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0349, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0707, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0417, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0435, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0408, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0454, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0460, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0444, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0473, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0457, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0354, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0489, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0500, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0477, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0336, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0391, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0450, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0538, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0457, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0419, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0532, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0463, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0336, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0254, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0265, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0683, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0512, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0476, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0401, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0426, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0337, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0521, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0422, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0380, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0325, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0386, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0315, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0564, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0399, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0309, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0490, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0386, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0798, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0301, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0369, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0337, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0382, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0538, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0377, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0316, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0349, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0405, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0411, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0451, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0406, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0291, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0433, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0401, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0502, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0336, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0462, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0562, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0320, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0396, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0505, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0239, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0292, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0474, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0597, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0299, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0437, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0411, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0548, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0365, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0542, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0575, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0346, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0316, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0416, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0481, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0418, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0274, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0539, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0442, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0386, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0448, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0529, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0389, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0419, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0268, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0500, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0707, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0371, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0462, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0381, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0378, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0529, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0410, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0455, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0492, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0425, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0289, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0452, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0413, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0419, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0463, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0234, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0511, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0446, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0535, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0386, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0394, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0499, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0282, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0290, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0343, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0477, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0546, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0346, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0605, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0420, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0328, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0307, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0526, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0368, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0566, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0333, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0360, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0390, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0462, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0454, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0442, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0332, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0367, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0462, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0646, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0407, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0487, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0333, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0402, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0363, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0400, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0408, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0410, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0452, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0230, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0313, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0581, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0563, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0410, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0435, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0433, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0554, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0408, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0417, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0271, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0392, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0319, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0556, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0452, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0521, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0340, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0381, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0579, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0471, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0506, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0449, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0313, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0377, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0395, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0526, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0307, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0347, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0396, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0436, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0509, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0406, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0453, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0270, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0530, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0431, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0489, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0418, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0344, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0444, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0263, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0458, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0484, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0474, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0317, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0477, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0638, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0283, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0281, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0402, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0421, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0362, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0413, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0512, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0477, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0473, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0526, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0469, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0395, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0408, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0283, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0405, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0390, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0335, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0271, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0358, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0493, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0241, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0355, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0550, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0344, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0430, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0346, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0296, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0581, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0460, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0530, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0630, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0344, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0533, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0446, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0544, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0372, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0447, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0313, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0301, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0495, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0266, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0433, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0505, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0478, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0341, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0521, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0338, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0622, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0500, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0548, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0397, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0355, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0395, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0213, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0419, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0404, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0365, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0415, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0594, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0329, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0532, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0529, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0466, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0510, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0367, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0403, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0478, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0412, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0518, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0389, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0433, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0378, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0694, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0200, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0369, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0314, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0425, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0339, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0479, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0320, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0532, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0172, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0374, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0302, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0324, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0244, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0469, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0263, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0354, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0325, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0458, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0450, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0413, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0364, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0463, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0563, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0548, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0270, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0465, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0247, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0379, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0373, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0325, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0399, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0448, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0324, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0540, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0395, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0419, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0459, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0461, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0574, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0234, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0458, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0388, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0316, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0589, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0302, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0321, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0346, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0270, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0264, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0414, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0293, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0356, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0336, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0405, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0386, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0606, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0339, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0373, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0332, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0452, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0504, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0434, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0451, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0395, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0697, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0322, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0431, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0326, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0429, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0348, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0393, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0517, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0379, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0247, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0369, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0486, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0359, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0408, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0380, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0329, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0404, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0472, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0347, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0416, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0511, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0342, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0664, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0640, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0286, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0385, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0328, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0563, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0457, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0396, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0570, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0535, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0391, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0424, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0414, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0406, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0596, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0485, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0519, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0484, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0338, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0417, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0533, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0359, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0509, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0415, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0341, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0317, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0341, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0411, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0417, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0391, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0331, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0233, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0363, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0285, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0423, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0253, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0412, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0468, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0388, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0355, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0334, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0312, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0526, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0283, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0678, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0400, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0325, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0450, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0306, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0312, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0469, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0371, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0598, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0400, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0702, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0418, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0432, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0346, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0309, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0327, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0428, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0441, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0406, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0331, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0339, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0346, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0370, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0347, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0238, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0282, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0362, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0432, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0350, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0172, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0443, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0515, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0381, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0325, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0358, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0539, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0602, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0369, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0351, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0362, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0652, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0297, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0473, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0513, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0257, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0427, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0403, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0417, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0551, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0447, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0631, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0484, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0275, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0572, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0355, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0279, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0381, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0486, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0516, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0494, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0276, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0362, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0391, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0394, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0421, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0400, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0378, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0376, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0610, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0390, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0352, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0570, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0268, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0348, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0353, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0382, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0455, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0423, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0396, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0254, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0484, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0426, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0626, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0470, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0402, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0346, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0482, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0354, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0309, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0350, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0347, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0376, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0427, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0512, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0366, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0409, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0490, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0333, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0346, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0355, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0366, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0508, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0195, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0519, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0424, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0352, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0217, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0448, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0164, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0525, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0333, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0372, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0266, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0426, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0320, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0420, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0356, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0298, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0456, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0357, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0388, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0415, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0245, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0366, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0450, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0231, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0393, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0453, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0350, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0491, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0422, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0618, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0378, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0292, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0473, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0236, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0352, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0337, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0324, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0310, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0383, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0477, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0372, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0217, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0164, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0313, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0249, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0429, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0425, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0374, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0454, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0405, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0440, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0314, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0654, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0471, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0352, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0446, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0464, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0580, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0532, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0247, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0450, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0460, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0205, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0382, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0415, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0347, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0451, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0359, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0424, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0489, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0348, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0280, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0409, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0538, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0370, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0371, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0384, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0320, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0557, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0293, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0304, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0445, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0406, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0513, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0363, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0438, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0389, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0487, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0360, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0253, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0361, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0258, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0267, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0405, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0493, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0393, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0460, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0504, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0421, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0329, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0410, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0429, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0440, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0392, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0412, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0285, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0261, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0369, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0380, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0491, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0219, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0295, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0413, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0486, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0487, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0448, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0465, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0358, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0345, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0379, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0499, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0349, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0311, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0459, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0280, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0389, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0370, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0408, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0469, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0423, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0478, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0434, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0255, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0300, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0515, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0312, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0276, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0264, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0436, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0168, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0365, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0348, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0324, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0440, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0333, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0360, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0420, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0349, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0256, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0365, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0377, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0416, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0340, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0571, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0355, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0239, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0287, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0406, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0478, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0502, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0413, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0374, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0318, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0343, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0372, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0278, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0325, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0333, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0380, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0390, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0328, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0597, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0479, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0279, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0372, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0340, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0579, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0312, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0495, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0434, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0555, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0506, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0555, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0262, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0433, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0467, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0336, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0449, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0471, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0380, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0268, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0426, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0505, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0369, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0400, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0484, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0275, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0356, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0406, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0383, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0469, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0246, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0481, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0432, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0300, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0267, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0463, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0274, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0446, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0287, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0411, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0313, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0319, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0358, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0390, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0313, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0538, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0345, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0310, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0493, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0545, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0455, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0206, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0414, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0328, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0386, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0344, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0246, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0354, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0371, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0320, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0386, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0327, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0373, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0417, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0301, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0411, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0460, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0542, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0449, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0566, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0431, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0514, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0286, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0282, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0484, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0315, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0425, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0432, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0317, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0470, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0597, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0436, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0617, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0272, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0219, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0487, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0454, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0392, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0370, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0534, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0506, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0355, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0411, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0376, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0356, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0384, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0402, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0390, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0459, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0483, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0347, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0302, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0371, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0551, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0392, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0360, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0422, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0246, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0291, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0527, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0296, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0433, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0472, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0452, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0282, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0486, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0396, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0388, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0386, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0310, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0343, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0402, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0262, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0336, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0439, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0302, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0411, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0478, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0476, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0427, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0433, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0398, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0432, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0440, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0317, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0477, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0494, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0194, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0356, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0453, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0317, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0279, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0369, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0214, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0378, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0392, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0415, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0336, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0254, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0373, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0500, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0449, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0410, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0372, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0456, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0549, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0384, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0350, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0266, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0653, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0369, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0430, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0256, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0331, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0297, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0416, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0365, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0314, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0195, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0304, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0338, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0364, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0347, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0622, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0457, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0535, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0205, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0425, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0331, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0345, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0387, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0282, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0298, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0603, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0512, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0420, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0502, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0475, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0301, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0303, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0475, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0386, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0297, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0436, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0291, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0320, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0277, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0305, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0206, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0478, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0311, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0259, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0321, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0619, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0430, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0209, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0317, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0369, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0352, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0392, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0395, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0548, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0286, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0387, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0437, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0485, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0428, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0425, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0617, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0553, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0324, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0314, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0496, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0354, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0344, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0383, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0479, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0413, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0460, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0418, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0360, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0367, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0325, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0373, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0468, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0334, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0275, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0416, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0238, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0526, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0521, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0305, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0463, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0328, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0502, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0416, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0276, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0544, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0547, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0328, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0412, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0367, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0334, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0400, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0313, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0344, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0531, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0411, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0455, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0342, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0411, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0436, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0260, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0282, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0481, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0429, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0602, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0376, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0309, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0320, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0355, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0401, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0332, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0386, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0317, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0331, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0423, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0354, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0472, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0339, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0351, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0399, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0274, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0510, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0376, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0469, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0492, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0379, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0397, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0442, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0396, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0211, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0411, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0437, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0285, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0346, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0419, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0402, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0484, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0294, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0438, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0272, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0317, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0440, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0331, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0355, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0305, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0385, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0357, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0468, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0430, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0444, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0499, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0349, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0324, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0435, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0253, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0562, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0222, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0397, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0570, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0506, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0376, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0428, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0434, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0366, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0540, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0331, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0598, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0336, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0270, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0325, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0432, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0501, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0386, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0414, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0413, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0448, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0609, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0251, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0324, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0311, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0557, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0279, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0423, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0355, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0394, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0437, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0272, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0502, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0414, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0255, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0445, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0392, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0571, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0378, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0390, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0458, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0324, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0204, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0325, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0249, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0330, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0384, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0470, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0523, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0509, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0258, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0356, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0452, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0363, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0455, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0368, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0388, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0311, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0301, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0323, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0526, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0423, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0392, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0271, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0402, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0371, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0359, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0464, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0499, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0560, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0517, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0423, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0366, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0457, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0388, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0374, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0448, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0592, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0279, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0362, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0389, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0229, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0386, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0558, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0362, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0321, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0411, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0417, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0243, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0469, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0478, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0283, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0469, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0530, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0447, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0247, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0420, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0547, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0383, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0376, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0353, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0451, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0489, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0500, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0197, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0344, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0412, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0495, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0508, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0424, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0282, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0515, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0290, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0297, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0315, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0534, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0439, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0404, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0572, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0410, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0403, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0450, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0541, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0274, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0202, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0479, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0288, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0400, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0403, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0404, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0369, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0427, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0438, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0300, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0466, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0352, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0209, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0476, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0338, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0334, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0383, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0312, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0201, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0222, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0420, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0331, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0260, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0395, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0403, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0288, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0397, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0380, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0408, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0441, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0363, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0272, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0324, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0402, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0514, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0384, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0351, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0259, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0388, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0276, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0452, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0421, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0589, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0445, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0296, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0505, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0456, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0440, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0676, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0420, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0388, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0400, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0315, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0293, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0274, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0433, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0426, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0441, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0459, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0321, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0468, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0474, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0445, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0382, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0267, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0445, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0475, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0501, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0480, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0346, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0434, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0362, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0372, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0292, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0428, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0345, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0421, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0365, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0539, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0474, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0541, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0417, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0415, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0500, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0264, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0360, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0542, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0399, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0421, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0547, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0510, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0338, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0383, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0476, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0496, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0254, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0304, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0635, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0325, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0440, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0378, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0286, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0322, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0324, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0261, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0390, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0465, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0411, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0394, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0359, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0404, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0411, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0356, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0431, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0308, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0413, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0504, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0435, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0417, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0476, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0329, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0386, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0397, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0321, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0350, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0349, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0254, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0437, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0566, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0321, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0468, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0279, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0492, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0343, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0326, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0399, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0587, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0424, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0352, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0391, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0474, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0496, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0489, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0544, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0440, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0408, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0327, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0363, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0407, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0428, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0407, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0401, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0479, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0366, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0334, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0374, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0401, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0246, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0257, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0378, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0337, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0547, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0328, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0269, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0362, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0368, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0303, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0260, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0276, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0239, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0476, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0440, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0267, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0476, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0457, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0534, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0323, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0363, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0329, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0342, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0390, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0481, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0489, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0277, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0431, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0491, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0503, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0302, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0395, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0410, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0445, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0426, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0452, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0558, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0256, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0308, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0457, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0353, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0390, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0321, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0360, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0360, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0325, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0288, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0380, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0406, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0358, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0380, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0224, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0299, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0305, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0446, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0289, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0265, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0280, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0388, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0618, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0288, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0296, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0683, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0282, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0431, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0364, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0324, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0415, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0352, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0448, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0314, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0329, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0367, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0309, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0349, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0357, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0404, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0284, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0340, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0356, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0289, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0263, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0284, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0433, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0212, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0356, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0416, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0292, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0394, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0526, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0434, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0354, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0364, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0235, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0243, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0408, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0507, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0221, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0504, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0593, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0423, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0294, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0263, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0444, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0302, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0425, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0373, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0359, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0341, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0303, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0424, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0264, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0301, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0281, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0539, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0362, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0282, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0327, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0414, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2021, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0259, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0437, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0391, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0408, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0348, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0269, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0191, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0400, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0432, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0335, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0392, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0363, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0603, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0402, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0328, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0408, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0307, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0329, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0384, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0472, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0324, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0319, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0358, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0366, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0285, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0479, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0558, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0528, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0261, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0352, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0423, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0312, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0340, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0306, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0324, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0396, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0355, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0273, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0500, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0383, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0303, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0483, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0323, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0291, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0399, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0599, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0428, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0376, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0638, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0597, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0415, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0288, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0256, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0378, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0507, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0450, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0307, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0393, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0583, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0592, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0371, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0340, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0318, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0401, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0372, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0405, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0327, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0270, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0493, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0537, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0388, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0441, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0390, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0397, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0329, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0327, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0300, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0393, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0314, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0514, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0416, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0408, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0458, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0295, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0399, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0482, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0364, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0415, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0317, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0333, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0345, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0285, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0353, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0211, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0350, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0336, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0451, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0429, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0396, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0295, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0480, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0351, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0494, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0543, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0346, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0369, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0298, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0508, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0404, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0344, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0360, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0398, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0297, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0416, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0450, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0399, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0365, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0454, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0423, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0397, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0355, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0370, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0343, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0377, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0436, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0361, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0345, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0326, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0416, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0411, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0402, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0786, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0402, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0345, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0309, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0417, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0352, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0346, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0322, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0393, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0445, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0642, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0300, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0432, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0201, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0270, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0296, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0484, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0462, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0577, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0411, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0211, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0439, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0237, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0328, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0356, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0297, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0295, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0338, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0321, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0435, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0497, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0408, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0224, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0303, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0289, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0534, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0201, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0338, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0472, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0244, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0331, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0234, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0331, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0280, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0384, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0338, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0457, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0416, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0468, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0337, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0463, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0434, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0587, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0353, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0351, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0377, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0458, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0593, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0564, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0342, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0428, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0343, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0304, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0341, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0349, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0375, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0364, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0256, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0314, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0272, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0289, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0350, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0301, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0386, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0366, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0250, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0370, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0464, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0271, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0329, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0494, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0323, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0320, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0344, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0340, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0420, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0388, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0305, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0376, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0436, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0501, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0461, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0411, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0269, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0363, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0382, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0368, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0393, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0424, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0416, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0414, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0275, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0362, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0442, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0469, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0474, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0492, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0369, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0262, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0345, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0194, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0356, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0330, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0374, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0467, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0470, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0337, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0325, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0349, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0438, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0379, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0217, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0266, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0423, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0239, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0443, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0430, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0408, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0393, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0348, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0259, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0387, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0291, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0372, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0478, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0290, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0444, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0305, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0326, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0382, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0276, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0367, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0374, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0349, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0367, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0290, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0427, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0431, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0369, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0322, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0326, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0456, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0357, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0284, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0184, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0299, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0367, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0501, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0366, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0315, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0368, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0506, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0581, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0230, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0424, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0331, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0420, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0539, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0450, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0496, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0399, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0318, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0412, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0265, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0248, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0395, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0422, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0366, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0299, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0439, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0373, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0395, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0396, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0285, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0547, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0284, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0356, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0255, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0281, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0287, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0318, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0259, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0416, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0343, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0353, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0326, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0439, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0320, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0290, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0426, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0292, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0508, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0286, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0357, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0383, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0386, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0423, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0439, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0415, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0284, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0452, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0374, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0286, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0440, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0373, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0459, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0325, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0306, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0350, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0383, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0459, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0273, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0287, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0278, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0509, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0275, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0316, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0407, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0388, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0299, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0251, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0385, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0420, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0438, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0439, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0303, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0312, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0422, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0302, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0358, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0549, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0476, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0334, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0463, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0375, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0393, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0423, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0315, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0348, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0424, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0364, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0338, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0299, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0410, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0495, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0407, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0433, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0453, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0275, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0216, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0342, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0356, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0422, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0339, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0428, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0444, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0369, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0384, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0414, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0443, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0303, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0354, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0312, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0270, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0379, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0279, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0381, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0080, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0379, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0263, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0488, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0380, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0301, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0446, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0230, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0376, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0454, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0515, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0354, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0491, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0319, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0254, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0427, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0425, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0501, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0440, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0572, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0406, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0372, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0322, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0328, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0479, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0263, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0294, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0292, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0438, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0494, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0391, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0274, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0323, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0309, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0251, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0257, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0380, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0364, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0464, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0369, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0490, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0429, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0352, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0297, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0266, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0433, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0230, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0364, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0211, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0209, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0247, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0470, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0202, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0406, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0391, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0371, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0519, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0328, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0380, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0388, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0414, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0325, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0495, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0248, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0266, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0363, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0213, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0286, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0318, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0372, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0197, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0243, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0370, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0286, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0251, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0322, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0461, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0315, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0405, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0396, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0666, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0457, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0372, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0369, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0429, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0404, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0418, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0339, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0418, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0358, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0354, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0395, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0349, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0375, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0299, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0385, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0275, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0338, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0417, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0388, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0253, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0417, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0257, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0435, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0371, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0503, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0339, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0401, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0352, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0300, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0493, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0354, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0279, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0267, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0251, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0194, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0507, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0444, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0169, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0398, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0440, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0315, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0385, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0296, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0334, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0323, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0445, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0236, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0450, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0434, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0339, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0479, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0385, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0256, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0244, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0274, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0414, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0371, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0201, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0514, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0395, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0410, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0447, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0334, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0347, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0329, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0351, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0265, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0221, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0246, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0384, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0396, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0503, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0449, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0344, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0416, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0233, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0416, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0203, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0395, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0373, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0238, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0295, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0415, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0356, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0326, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0342, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0512, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0368, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0363, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0315, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0226, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0411, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0340, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0267, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0425, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0423, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0577, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0373, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0319, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0349, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0342, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0188, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0332, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0457, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0315, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0297, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0285, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0295, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0432, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0292, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0234, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0242, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0367, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0451, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0311, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0364, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0410, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0422, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0235, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0412, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0447, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0327, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0267, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0412, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0376, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0447, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0214, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0412, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0354, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0308, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0455, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0439, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0234, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0381, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0469, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0323, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0350, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0342, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0289, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0258, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0269, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0448, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0330, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0274, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0451, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0339, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0446, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0183, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0311, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0340, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0271, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0279, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0224, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0338, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0344, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0443, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0243, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0264, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0455, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0414, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0427, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0502, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0262, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0328, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0459, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0505, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0365, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0434, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0291, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0538, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0224, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0278, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0412, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0271, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0453, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0397, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0257, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0265, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0349, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0341, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0455, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0430, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0279, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0320, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0369, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0298, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0233, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0348, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0381, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0434, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0399, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0406, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0343, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0336, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0355, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0337, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0390, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0253, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0355, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0325, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0430, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0331, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0367, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0275, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0389, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0340, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0398, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0334, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0271, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0273, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0402, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0316, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0323, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0369, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0257, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0341, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0407, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0340, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0283, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0415, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0380, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0474, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0390, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0185, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0319, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0440, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0394, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0432, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0360, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0311, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0351, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0389, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0385, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0286, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0301, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0223, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0337, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0514, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0489, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0327, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0340, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0453, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0434, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0283, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0365, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0532, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0444, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0358, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0281, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0244, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0325, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0373, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0191, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0439, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0321, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0252, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0385, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0400, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0271, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0405, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0356, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0468, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0351, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0417, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0470, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0214, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0238, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0550, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0388, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0420, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0381, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0419, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0322, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0507, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0359, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0253, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0307, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0287, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0387, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0280, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0517, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0371, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0244, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0417, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0456, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0397, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0313, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0444, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0353, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0443, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0316, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0410, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0223, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0360, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0320, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0350, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0282, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0368, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0272, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0206, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0426, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0374, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0323, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0270, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0366, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0332, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0331, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0416, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0277, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0287, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0280, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0270, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0241, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0385, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0402, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0273, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0259, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0316, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0478, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0258, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0251, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0230, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0403, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0371, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0272, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0327, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0359, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0284, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0467, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0303, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0261, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0291, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0338, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0309, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0296, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0371, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0308, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0292, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0298, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0245, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0233, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0378, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0373, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0422, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0166, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0386, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0371, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0267, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0348, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0298, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0464, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0379, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0209, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0388, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0227, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0439, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0348, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0378, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0353, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0310, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0402, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0367, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0401, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0382, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0209, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0348, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0311, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0435, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0390, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0292, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0411, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0262, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0332, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0340, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0301, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0225, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0394, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0290, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0342, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0355, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0260, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0258, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0471, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0488, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0251, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0489, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0347, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0334, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0460, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0444, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0357, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0371, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0460, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0131, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0488, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0265, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0501, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0349, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0314, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0258, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0313, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0317, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0372, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0316, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0397, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0554, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0344, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0381, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0257, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0272, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0448, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0302, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0228, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0398, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0348, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0428, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0364, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0351, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0314, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0584, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0241, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0313, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0342, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0335, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0319, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0471, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0341, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0278, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0476, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0341, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0365, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0437, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0252, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0423, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0247, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0418, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0408, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0468, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0353, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0298, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0268, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0290, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0470, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0445, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0323, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0340, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0311, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0399, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0391, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0377, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0445, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0316, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0272, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0349, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0359, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0244, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0376, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0383, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0475, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0287, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0450, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0252, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0422, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0387, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0265, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0335, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0328, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0375, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0347, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0442, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0253, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0302, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0402, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0309, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0460, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0467, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0300, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0413, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0295, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0418, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0476, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0297, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0364, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0275, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0255, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0279, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0324, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0162, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0374, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0344, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0384, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0417, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0378, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0233, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0262, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0407, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0341, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0357, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0298, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0365, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0374, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0300, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0388, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0363, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0470, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0466, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0345, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0363, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0408, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0300, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0377, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0498, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0375, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0264, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0481, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0469, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0401, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0221, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0472, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0424, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0375, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0287, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0539, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0284, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0348, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0355, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0518, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0280, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0300, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0297, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0258, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0336, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0343, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0359, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0444, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0331, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0403, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0448, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0368, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0422, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0393, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0305, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0328, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0219, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0340, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0343, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0379, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0403, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0256, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0355, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0294, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0489, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0353, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0415, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0285, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0311, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0306, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0310, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0270, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0300, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0343, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0297, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0370, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0389, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0268, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0371, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0190, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0300, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0369, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0272, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0238, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0448, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0334, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0388, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0368, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0467, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0396, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0308, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0449, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0403, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0317, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0301, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0400, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0291, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0392, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0370, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0351, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0318, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0464, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0349, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0230, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0237, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0228, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0305, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0302, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0180, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0411, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0260, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0436, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0328, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0414, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0293, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0403, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0215, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0315, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0270, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0352, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0373, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0310, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0377, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0343, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0441, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0405, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0359, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0314, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0287, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0356, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0463, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0358, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0583, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0244, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0294, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0306, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0416, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0371, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0291, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0263, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0296, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0309, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0278, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0308, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0386, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0490, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0324, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0345, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0383, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0434, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0405, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0354, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0398, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0442, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0331, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0257, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0444, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0217, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0363, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0439, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0365, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0341, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0389, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0274, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0335, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0377, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0227, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0411, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0320, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0312, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0378, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0236, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0266, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0366, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0312, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0312, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0300, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0351, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0254, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0324, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0325, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0373, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0448, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0271, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0450, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0273, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0256, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0300, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0322, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0453, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0412, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0228, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0242, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0326, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0412, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0396, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0288, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0249, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0448, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0334, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0293, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0332, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0458, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0247, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0256, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0412, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0294, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0459, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0220, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0424, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0264, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0194, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0269, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0291, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0498, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0284, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0250, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0386, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0287, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0335, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0312, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0340, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0279, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0187, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0412, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0299, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0390, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0338, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0418, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0166, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0342, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0519, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0467, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0266, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0282, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0297, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0468, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0487, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0238, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0237, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0331, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0240, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0269, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0232, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0317, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0381, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0412, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0330, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0244, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0528, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0528, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0314, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0320, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0228, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0429, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0413, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0266, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0347, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0263, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0374, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0285, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0401, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0485, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0233, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0337, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0327, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0269, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0434, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0386, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0361, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0394, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0266, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0237, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0228, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0263, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0388, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0338, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0565, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0485, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0442, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0307, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0262, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0346, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0312, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0486, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0279, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0297, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0305, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0342, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0301, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0228, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0462, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0484, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0332, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0317, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0232, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0323, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0323, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0356, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0326, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0209, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0329, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0369, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0332, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0295, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0172, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0207, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0474, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0420, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0368, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0261, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0446, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0308, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0279, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0272, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0370, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0228, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0313, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0401, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0319, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0269, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0297, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0381, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0318, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0482, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0206, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0364, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0238, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0315, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0189, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0316, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0468, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0366, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0291, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0304, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0420, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0457, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0304, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0360, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0263, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0331, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0372, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0247, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0229, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0267, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0368, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0449, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0306, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0376, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0369, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0211, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0264, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0381, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0420, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0329, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0305, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0340, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0345, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0336, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0260, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0512, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0319, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0393, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0371, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0351, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0451, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0397, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0331, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0377, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0280, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0286, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0271, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0288, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0376, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0368, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0321, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0297, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0530, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0164, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0410, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0406, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0217, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0488, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0241, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0397, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0434, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0351, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0332, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0495, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0202, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0367, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0254, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0354, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0160, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0294, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0431, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0299, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0380, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0443, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0278, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0319, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0410, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0282, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0310, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0198, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0507, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0250, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0289, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0368, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0404, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0339, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0309, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0242, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0274, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0262, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0393, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0234, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0184, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0296, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0274, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0462, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0344, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0225, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0222, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0387, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0466, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0462, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0330, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0367, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0305, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0393, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0424, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0338, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0313, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0483, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0323, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0254, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0334, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0149, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0281, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0378, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0419, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0324, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0355, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0303, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0210, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0267, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0207, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0361, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0469, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0274, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0439, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0328, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0201, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0245, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0247, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0267, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0281, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0425, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0336, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0429, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0239, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0157, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0305, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0342, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0444, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0295, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0384, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0182, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0202, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0405, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0319, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0434, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0408, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0404, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0264, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0432, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0352, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0344, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0356, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0383, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0350, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0294, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0295, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0307, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0239, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0343, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0236, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0222, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0425, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0307, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0358, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0383, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0321, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0399, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0284, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0236, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0295, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0246, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0366, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0284, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0335, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0300, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0315, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0361, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0404, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0425, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0238, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0380, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0257, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0176, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0301, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0313, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0427, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0213, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0299, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0256, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0297, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0279, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0370, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0292, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0334, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0384, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0236, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0279, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0264, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0296, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0288, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0274, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0225, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0424, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0295, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0346, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0298, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0366, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0365, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0428, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0296, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0259, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0344, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0831, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0308, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0433, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0316, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0366, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0311, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0238, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0330, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0274, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0319, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0415, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0540, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0350, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0307, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0439, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0344, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0355, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0229, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0297, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0324, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0429, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0341, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0232, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0268, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0306, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0344, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0252, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0468, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0278, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0218, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0281, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0188, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0329, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0371, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0159, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0272, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0394, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0255, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0465, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0343, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0164, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0392, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0353, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0444, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0384, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0523, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0286, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0332, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0405, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0283, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0272, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0280, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0345, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0473, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0323, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0207, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0258, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0314, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0216, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0330, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0400, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0187, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0315, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0445, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0311, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0372, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0190, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0522, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0286, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0332, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0391, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0372, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0206, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0289, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0337, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0249, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0288, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0335, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0261, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0336, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0203, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0343, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0241, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0326, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0311, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0321, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0327, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0316, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0366, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0292, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0258, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0382, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0306, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0427, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0354, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0293, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0383, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0298, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0513, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0492, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0261, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0331, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0435, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0333, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0364, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0257, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0359, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0292, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0365, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0322, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0210, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0295, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0559, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0412, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0167, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0242, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0259, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0189, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0297, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0399, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0393, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0255, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0284, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0174, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0386, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0422, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0211, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0340, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0287, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0376, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0419, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0399, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0266, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0329, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0278, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0275, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0344, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0176, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0217, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0353, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0315, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0273, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0363, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0326, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0269, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0354, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0436, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0214, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0195, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0345, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0266, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0391, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0311, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0158, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0415, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0384, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0389, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0535, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0205, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0346, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0229, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0391, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0209, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0272, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0305, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0385, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0373, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0411, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0284, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0361, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0352, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0304, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0211, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0326, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0287, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0298, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0202, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0228, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0251, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0353, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0305, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0456, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0307, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0272, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0385, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0259, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0365, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0241, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0362, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0208, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0433, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0240, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0550, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0312, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0278, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0434, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0244, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0303, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0358, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0306, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0269, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0333, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0450, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0390, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0386, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0369, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0357, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0268, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0256, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0264, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0271, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0253, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0397, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0400, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0350, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0290, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0501, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0310, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0249, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0391, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0341, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0421, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0360, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0377, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0221, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0233, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0331, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0449, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0307, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0318, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0249, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0425, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0333, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0234, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0335, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0322, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0446, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0275, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0142, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0299, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0315, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0522, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0351, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0390, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0229, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0346, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0253, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0218, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0253, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0243, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0378, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0320, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0203, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0158, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0397, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0170, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0331, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0588, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0295, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0389, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0311, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0296, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0435, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0285, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0158, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0293, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0230, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0154, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0292, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0353, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0458, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0300, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0556, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0405, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0459, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0283, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0247, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0258, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0544, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0218, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0261, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0286, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0167, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0220, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0318, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0275, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0227, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0297, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0346, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0266, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0251, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0299, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0297, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0409, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0282, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0315, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0238, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0166, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0157, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0199, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0364, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0287, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0285, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0321, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0244, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0323, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0436, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0409, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0424, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0323, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0293, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0257, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0323, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0230, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0313, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0286, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0354, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0319, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0262, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0422, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0266, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0412, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0378, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0351, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0281, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0133, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0334, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0318, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0379, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0300, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0321, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0325, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0291, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0256, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0297, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0390, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0291, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0286, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0177, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0234, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0470, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0342, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0382, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0266, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0389, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0246, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0321, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0462, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0347, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0439, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0329, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0261, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0271, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0268, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0265, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0321, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0240, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0289, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0370, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0313, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0223, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0291, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0242, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0208, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0399, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0308, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0319, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0472, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0233, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0331, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0398, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0214, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0256, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0391, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0405, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0163, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0207, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0299, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0596, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0300, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0228, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0215, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0249, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0231, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0438, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0482, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0417, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0302, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0252, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0425, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0306, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0370, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0481, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0343, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0428, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0206, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0180, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0305, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0279, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0316, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0274, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0350, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0312, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0294, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0296, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0294, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0206, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0389, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0281, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0447, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0242, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0333, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0225, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0273, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0468, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0332, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0264, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0330, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0257, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0430, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0289, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0199, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0347, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0544, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0310, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0256, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0231, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0298, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0428, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0348, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0388, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0227, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0494, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0292, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0265, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0360, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0324, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0313, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0162, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0113, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0270, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0293, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0386, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0275, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0402, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0282, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0262, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0408, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0313, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0285, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0352, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0172, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0182, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0172, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0407, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0217, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0413, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0305, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0340, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0500, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0442, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0338, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0308, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0231, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0272, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0313, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0332, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0326, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0303, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0498, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0323, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0281, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0223, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0170, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0259, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0271, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0316, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0427, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0256, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0511, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0254, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0286, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0287, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0277, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0300, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0427, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0216, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0312, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0211, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0298, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0308, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0339, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0386, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0333, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0389, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0271, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0330, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0230, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0248, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0215, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0306, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0272, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0337, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0343, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0252, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0268, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0267, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0357, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0272, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0290, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0383, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0239, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0411, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0259, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0235, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0272, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0461, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0264, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0300, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0371, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0239, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0264, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0325, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0341, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0242, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0356, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0160, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0256, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0309, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0389, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0199, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0205, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0308, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0342, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0212, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0305, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0246, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0469, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0313, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0208, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0361, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0189, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0400, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0346, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0307, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0278, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0252, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0280, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0447, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0308, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0265, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0334, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0280, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0429, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0480, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0355, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0304, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0389, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0274, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0464, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0324, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0379, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0283, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0244, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0383, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0339, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0270, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0311, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0299, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0342, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0273, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0303, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0240, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0235, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0383, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0232, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0273, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0236, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0376, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0242, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0350, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0374, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0335, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0357, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0390, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0346, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0262, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0306, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0408, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0325, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0412, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0423, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0259, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0350, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0391, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0360, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0295, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0220, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0214, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0239, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0262, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0368, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0272, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0267, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0360, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0299, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0392, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0211, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0297, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0246, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0512, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0467, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0387, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0175, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0242, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0354, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0301, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0352, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0534, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0301, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0325, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0431, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0403, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0402, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0439, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0334, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0361, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0255, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0324, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0232, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0296, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0322, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0261, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0450, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0218, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0277, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0280, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0481, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0338, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0236, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0356, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0329, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0430, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0272, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0400, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0385, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0461, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0222, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0202, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0312, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0296, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0321, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0227, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0276, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0341, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0249, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0350, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0450, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0397, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0324, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0203, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0274, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0510, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0327, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0354, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0346, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0280, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0334, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0200, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0283, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0275, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0278, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0393, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0243, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0229, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0320, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0177, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0301, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0296, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0378, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0291, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0188, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0245, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0381, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0328, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0458, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0200, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0249, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0176, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0250, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0266, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0281, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0392, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0482, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0221, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0309, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0334, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0213, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0293, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0361, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0197, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0357, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0234, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0235, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0339, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0305, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0179, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0448, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0319, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0255, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0426, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0350, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0471, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0169, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0392, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0377, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0313, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0285, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0244, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0241, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0295, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0259, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0246, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0303, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0333, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0547, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0254, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0332, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0223, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0362, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0222, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0330, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0341, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0327, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0263, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0455, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0390, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0289, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0321, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0263, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0456, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0318, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0381, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0314, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0356, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0299, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0327, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0438, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0378, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0202, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0429, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0241, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0303, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0241, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0253, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0311, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0406, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0399, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0422, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0400, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0376, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0419, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0360, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0408, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0288, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0199, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0337, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0223, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0337, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0294, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0369, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0342, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0201, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0229, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0286, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0281, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0218, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0303, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0306, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0332, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0265, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0271, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0366, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0352, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0206, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0250, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0356, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0364, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0182, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0278, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0260, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0290, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0339, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0375, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0381, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0230, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0331, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0431, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0370, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0315, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0467, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0247, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0297, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0188, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0276, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0322, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0314, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0353, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0321, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0287, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0336, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0450, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0414, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0241, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0456, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0332, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0388, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0304, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0176, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0278, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0378, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0487, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0248, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0398, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0380, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0314, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0490, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0365, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0356, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0338, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0225, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0267, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0384, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0306, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0375, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0276, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0310, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0295, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0459, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0230, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0219, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0332, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0208, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0464, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0395, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0250, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0382, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0297, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0170, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0231, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0326, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0290, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0264, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0205, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0299, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0273, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0351, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0198, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0409, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0208, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0404, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0356, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0358, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0328, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0365, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0350, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0265, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0150, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0220, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0399, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0400, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0393, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0339, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0353, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0465, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0340, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0299, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0541, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0332, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0239, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0526, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0278, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0282, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0304, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0285, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0376, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0376, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0227, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0374, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0290, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0332, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0228, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0133, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0243, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0409, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0414, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0529, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0333, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0429, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0232, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0290, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0148, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0289, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0264, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0298, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0276, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0267, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0264, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0294, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0232, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0390, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0236, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0489, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0177, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0244, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0271, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0326, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0292, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0325, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0290, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0402, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0271, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0369, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0334, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0256, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0368, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0434, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0286, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0244, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0294, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0285, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0241, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0491, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0354, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0319, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0393, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0288, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0304, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0308, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0356, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0378, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0289, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0176, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0572, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0238, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0266, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0345, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0278, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0309, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0336, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0358, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0120, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0157, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0300, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0398, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0303, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0161, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0168, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0202, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0309, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0327, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0331, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0434, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0224, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0273, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0377, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0263, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0229, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0373, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0188, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0164, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0272, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0383, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0289, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0285, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0249, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0315, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0307, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0280, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0318, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0389, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0497, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0268, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0303, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0274, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0331, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0348, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0358, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0372, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0293, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0387, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0239, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0349, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0345, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0446, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0343, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0195, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0201, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0257, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0404, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0349, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0340, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0302, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0246, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0393, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0582, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0292, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0412, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0259, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0344, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0290, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0205, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0208, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0382, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0339, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0309, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0497, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0395, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0182, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0219, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0329, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0255, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0216, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0169, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0275, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0330, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0430, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0242, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0423, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0255, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0297, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0336, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0277, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0272, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0238, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0326, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0323, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0188, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0244, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0232, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0255, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0287, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0346, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0310, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0510, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0380, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0274, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0270, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0335, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0307, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0255, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0204, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0214, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0328, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0440, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0297, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0296, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0235, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0345, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0338, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0100, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0289, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0349, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0451, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0213, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0280, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0331, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0432, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0326, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0316, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0190, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0417, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0448, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0333, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0313, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0203, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0317, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0356, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0444, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0415, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0356, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0306, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0340, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0240, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0282, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0328, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0237, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0333, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0203, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0225, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0213, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0340, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0276, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0278, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0323, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0306, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0469, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0162, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0241, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0285, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0386, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0431, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0500, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0147, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0242, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0301, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0270, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0162, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0353, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0301, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0341, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0331, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0440, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0185, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0283, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0258, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0390, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0340, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0268, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0321, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0309, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0445, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0224, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0365, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0463, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0191, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0354, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0241, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0259, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0275, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0574, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0320, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0325, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0365, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0298, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0417, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0222, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0232, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0164, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0183, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0293, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0269, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0325, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0417, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0244, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0316, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0198, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0220, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0432, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0343, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0274, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0283, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0246, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0283, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0223, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0321, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0278, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0328, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0222, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0236, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0180, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0286, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0361, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0356, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0267, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0310, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0181, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0195, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0192, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0413, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0231, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0254, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0291, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0480, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0358, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0145, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0344, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0226, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0246, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0205, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0181, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0296, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0236, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0261, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0278, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0319, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0200, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0273, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0394, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0366, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0339, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0289, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0346, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0274, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0397, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0389, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0256, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0444, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0191, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0246, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0269, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0321, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0307, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0340, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0185, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0242, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0278, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0301, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0258, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0443, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0391, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0271, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0270, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0387, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0245, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0447, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0295, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0212, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0487, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0303, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0306, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0206, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0211, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0405, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0393, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0252, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0439, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0193, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0175, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0398, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0262, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0308, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0287, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0155, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0218, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0256, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0300, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0270, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0414, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0374, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0355, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0180, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0368, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0151, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0169, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0369, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0296, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0262, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0230, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0312, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0445, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0346, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0265, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0611, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0350, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0303, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0384, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0339, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0158, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0175, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0258, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0222, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0367, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0238, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0424, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0140, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0380, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0225, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0235, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0313, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0187, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0263, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0310, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0325, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0397, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0287, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0318, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0236, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0385, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0468, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0329, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0245, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0234, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0280, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0229, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0265, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0301, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0258, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0280, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0192, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0411, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0208, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0293, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0456, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0396, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0303, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0325, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0441, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0439, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0273, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0412, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0319, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0232, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0306, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0281, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0431, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0454, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0352, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0473, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0404, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0360, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0411, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0362, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0369, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0199, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0257, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0293, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0288, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0332, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0281, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0237, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0322, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0351, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0360, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0293, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0254, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0299, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0271, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0299, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0399, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0228, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0348, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0277, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0391, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0207, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0285, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0141, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0229, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0197, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0346, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0305, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0264, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0240, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0175, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0313, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0298, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0398, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0331, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0321, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0356, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0351, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0528, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0387, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0602, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0238, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0263, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0364, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0276, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0266, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0237, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0245, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0389, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0297, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0286, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0228, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0254, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0293, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0199, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0292, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0106, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0283, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0282, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0318, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0346, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0210, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0271, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0275, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0496, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0271, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0270, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0215, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0305, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0256, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0303, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0319, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0345, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0313, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0327, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0383, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0262, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0286, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0225, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0245, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0179, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0160, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0208, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0319, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0213, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0340, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0248, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0247, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0250, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0268, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0448, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0238, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0309, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0274, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0324, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0307, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0390, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0452, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0340, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0240, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0198, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0248, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0318, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0254, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0342, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0286, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0362, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0298, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0299, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0207, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0237, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0283, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0242, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0254, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0394, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0355, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0314, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0380, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0291, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0303, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0257, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0353, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0217, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0428, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0279, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0493, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0302, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0242, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0230, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0390, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0233, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0399, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0300, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0454, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0293, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0226, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0302, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0276, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0260, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0149, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0193, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0404, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0331, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0537, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0372, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0276, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0288, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0332, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0221, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0354, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0355, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0278, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0304, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0296, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0299, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0373, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0174, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0193, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0182, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0252, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0543, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0270, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0294, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0506, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0372, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0175, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0368, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0367, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0298, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0410, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0192, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0242, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0138, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0354, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0247, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0308, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0415, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0173, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0326, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0259, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0354, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0292, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0421, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0362, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0406, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0279, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0381, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0318, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0507, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0222, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0226, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0294, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0261, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0275, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0274, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0345, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0352, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0332, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0335, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0186, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0263, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0188, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0435, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0402, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0192, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0233, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0348, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0338, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0196, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0256, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0216, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0294, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0267, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0218, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0433, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0489, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0261, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0114, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0139, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0240, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0235, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0414, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0232, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0206, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0311, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0261, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0432, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0156, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0453, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0340, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0353, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0344, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0267, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0387, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0563, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0275, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0215, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0357, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0302, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0254, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0202, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0214, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0437, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0354, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0284, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0366, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0255, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0225, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0340, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0393, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0316, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0470, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0207, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0348, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0459, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0296, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0347, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0338, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0413, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0197, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0259, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0242, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0183, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0328, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0363, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0306, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0266, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0302, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0252, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0219, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0322, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0439, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0238, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0227, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0377, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0328, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0272, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0237, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0169, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0269, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0295, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0285, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0384, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0178, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0393, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0234, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0195, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0258, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0461, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0423, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0220, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0201, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0317, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0339, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0402, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0371, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0350, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0290, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0239, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0212, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0239, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0365, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0207, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0287, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0149, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0390, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0181, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0124, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0376, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0227, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0414, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0296, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0256, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0359, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0174, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0187, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0333, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0364, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0243, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0266, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0250, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0285, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0232, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0252, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0325, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0360, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0236, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0177, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0319, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0296, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0419, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0232, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0382, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0263, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0315, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0236, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0317, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0361, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0314, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0299, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0367, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0251, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0155, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0246, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0146, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0216, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0379, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0304, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0277, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0380, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0333, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0239, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0508, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0282, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0392, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0348, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0132, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0228, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0385, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0390, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0192, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0195, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0325, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0435, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0230, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0335, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0290, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0237, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0398, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0196, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0246, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0300, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0351, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0259, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0318, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0316, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0265, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0271, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0273, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0550, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0277, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0231, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0259, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0331, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0276, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0405, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0190, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0288, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0255, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0354, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0365, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0315, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0212, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0299, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0278, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0280, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0301, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0251, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0355, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0241, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0363, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0331, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0288, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0237, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0280, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0401, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0216, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0313, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0204, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0197, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0332, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0190, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0401, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0248, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0259, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0250, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0303, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0214, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0272, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0175, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0259, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0323, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0272, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0389, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0380, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0228, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0180, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0155, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0359, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0295, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0299, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0327, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0245, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0292, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0226, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0394, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0584, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0269, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0207, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0309, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0551, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0217, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0307, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0169, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0279, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0265, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0244, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0338, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0319, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0290, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0154, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0252, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0271, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0362, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0381, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0424, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0359, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0307, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0283, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0215, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0392, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0266, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0328, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0251, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0353, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0201, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0262, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0269, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0273, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0197, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0249, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0452, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0265, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0466, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0165, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0351, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0223, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0318, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0403, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0224, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0324, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0380, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0258, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0353, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0205, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0275, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0177, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0221, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0302, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0390, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0274, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0311, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0303, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0294, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0287, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0245, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0249, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0233, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0288, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0414, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0242, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0421, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0418, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0356, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0283, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0243, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0261, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0220, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0364, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0310, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0282, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0281, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0262, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0415, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0419, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0311, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0343, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0293, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0360, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0301, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0263, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0198, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0232, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0305, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0245, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0415, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0187, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0422, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0290, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0224, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0215, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0511, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0250, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0257, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0450, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0356, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0394, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0189, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0294, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0271, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0264, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0268, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0313, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0237, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0342, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0295, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0372, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0390, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0313, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0163, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0300, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0221, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0187, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0271, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0294, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0322, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0290, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0278, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0261, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0282, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0208, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0270, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0433, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0132, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0266, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0313, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0287, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0325, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0261, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0216, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0201, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0278, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0256, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0240, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0213, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0309, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0287, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0228, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0308, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0316, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0248, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0212, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0268, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0396, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0416, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0298, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0333, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0307, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0176, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0263, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0259, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0205, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0287, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0211, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0272, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0350, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0411, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0361, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0260, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0186, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0271, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0319, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0341, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0266, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0296, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0465, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0374, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0341, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0205, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0191, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0214, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0206, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0315, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0258, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0301, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0347, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0337, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0310, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0309, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0225, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0323, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0156, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0248, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0271, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0256, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0128, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0225, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0282, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0271, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0374, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0248, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0230, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0350, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0378, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0456, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0378, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0287, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0263, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0248, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0208, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0354, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0210, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0227, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0309, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0134, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0360, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0250, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0281, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0279, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0203, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0208, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0259, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0360, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0361, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0351, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0322, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0396, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0266, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0222, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0324, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0188, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0379, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0210, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0320, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0263, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0266, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0309, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0306, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0367, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0262, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0248, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0204, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0313, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0269, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0231, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0275, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0308, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0309, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0259, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0250, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0307, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0296, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0465, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0257, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0291, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0399, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0219, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0213, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0414, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0342, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0267, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0164, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0222, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0334, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0276, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0359, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0420, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0376, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0391, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0364, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0300, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0279, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0280, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0273, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0214, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0282, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0208, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0248, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0190, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0288, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0341, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0209, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0311, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0280, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0415, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0315, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0312, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0389, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0285, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0434, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0180, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0250, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0235, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0398, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0273, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0216, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0186, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0245, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0332, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0349, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0303, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0280, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0332, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0225, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0219, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0196, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0240, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0353, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0243, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0285, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0317, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0394, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0272, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0416, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0258, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0239, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0253, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0224, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0123, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0216, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0321, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0299, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0230, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0224, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0184, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0284, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0356, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0292, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0302, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0150, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0290, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0291, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0242, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0320, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0239, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0185, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0262, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0201, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0457, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0356, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0214, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0310, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0208, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0256, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0243, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0289, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0257, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0251, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0244, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0380, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0206, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0239, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0274, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0225, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0360, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0296, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0422, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0292, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0281, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0250, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0248, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0276, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0169, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0388, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0177, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0273, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0353, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0198, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0203, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0243, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0443, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0270, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0306, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0270, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0290, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0228, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0306, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0453, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0294, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0466, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0427, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0389, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0229, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0218, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0281, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0257, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0307, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0341, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0393, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0288, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0362, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0339, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0340, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0473, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0197, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0268, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0356, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0302, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0282, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0169, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0298, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0477, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0344, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0441, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0346, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0188, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0207, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0217, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0197, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0292, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0320, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0310, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0222, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0181, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0225, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0290, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0264, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0228, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0234, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0443, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0375, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0418, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0273, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0381, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0372, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0154, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0378, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0276, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0486, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0307, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0325, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0252, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0242, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0267, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0318, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0358, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0316, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0185, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0311, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0283, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0278, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0306, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0327, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0225, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0348, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0307, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0271, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0230, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0262, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0253, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0249, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0312, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0365, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0421, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0387, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0345, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0337, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0273, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0258, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0316, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0365, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0384, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0159, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0322, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0429, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0588, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0142, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0272, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0272, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0341, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0274, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0222, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0190, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0374, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0332, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0235, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0265, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0274, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0209, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0181, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0275, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0242, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0297, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0258, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0267, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0373, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0228, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0244, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0360, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0372, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0398, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0231, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0291, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0242, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0290, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0308, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0343, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0307, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0360, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0262, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0404, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0362, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0269, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0393, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0277, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0296, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0183, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0423, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0281, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0260, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0312, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0349, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0232, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0217, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0185, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0282, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0306, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0255, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0354, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0320, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0310, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0332, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0345, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0311, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0361, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0410, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0449, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0246, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0238, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0333, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0326, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0348, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0243, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0170, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0183, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0284, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0210, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0292, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0194, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0219, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0342, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0254, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0355, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0237, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0246, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0251, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0251, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0258, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0246, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0313, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0308, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0255, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0382, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0221, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0364, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0316, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0362, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0376, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0350, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0341, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0309, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0275, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0288, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0459, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0265, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0209, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0327, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0280, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0290, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0241, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0254, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0290, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0194, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0320, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0348, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0313, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0274, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0201, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0256, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0250, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0269, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0322, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0202, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0338, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0280, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0267, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0230, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0330, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0350, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0203, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0179, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0367, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0269, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0395, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0302, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0209, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0295, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0282, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0243, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0348, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0294, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0318, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0393, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0430, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0249, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0313, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0209, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0249, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0147, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0246, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0270, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0417, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0253, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0348, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0301, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0290, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0571, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0382, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0424, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0234, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0232, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0287, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0259, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0299, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0313, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0257, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0330, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0282, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0204, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0279, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0230, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0228, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0267, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0124, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0425, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0199, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0229, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0153, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0501, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0507, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0281, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0340, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0253, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0285, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0276, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0471, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0288, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0191, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0438, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0262, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0284, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0372, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0363, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0364, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0193, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0321, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0203, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0268, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0264, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0258, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0322, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0466, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0182, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0367, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0380, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0316, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0212, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0394, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0282, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0208, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0238, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0251, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0329, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0397, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0319, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0303, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0377, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0209, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0307, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0307, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0347, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0273, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0317, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0236, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0317, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0219, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0303, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0205, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0211, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0231, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0247, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0235, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0262, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0235, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0271, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0316, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0334, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0303, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0320, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0351, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0252, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0245, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0282, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0270, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0217, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0283, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0328, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0243, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0245, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0338, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0266, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0219, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0238, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0285, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0224, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0335, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0254, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0246, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0232, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0450, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0300, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0361, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0311, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0285, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0361, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0481, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0237, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0303, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0210, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0217, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0244, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0456, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0174, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0230, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0317, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0271, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0393, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0346, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0365, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0395, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0300, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0133, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0327, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0224, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0288, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0430, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0248, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0193, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0231, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0309, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0163, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0231, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0241, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0311, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0380, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0213, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0210, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0202, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0339, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0257, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0208, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0416, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0301, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0278, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0473, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0300, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0236, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0359, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0272, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0303, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0379, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0296, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0269, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0199, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0225, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0333, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0264, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0334, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0223, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0311, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0226, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0222, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0185, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0292, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0282, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0222, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0232, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0428, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0242, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0236, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0149, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0356, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0403, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0156, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0204, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0333, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0220, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0361, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0248, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0293, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0295, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0294, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0339, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0207, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0252, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0371, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0447, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0255, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0197, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0386, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0332, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0238, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0290, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0325, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0233, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0319, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0352, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0329, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0164, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0178, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0268, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0377, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0244, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0333, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0194, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0181, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0199, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0249, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0265, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0352, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0264, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0190, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0214, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0319, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0280, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0194, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0208, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0361, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0157, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0212, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0202, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0260, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0234, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0235, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0262, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0428, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0373, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0238, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0280, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0269, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0301, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0359, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0206, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0166, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0155, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0367, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0210, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0239, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0332, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0132, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0279, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0381, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0296, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0212, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0282, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0207, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0353, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0237, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0280, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0232, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0156, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0376, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0294, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0270, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0244, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0326, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0277, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0408, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0356, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0264, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0200, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0179, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0373, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0335, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0438, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0187, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0207, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0251, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0209, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0275, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0370, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0245, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0362, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0349, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0232, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0276, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0158, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0234, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0220, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0385, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0347, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0221, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0273, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0554, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0370, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0229, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0173, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0295, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0196, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0197, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0309, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0205, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0268, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0320, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0211, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0349, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0335, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0328, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0216, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0381, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0278, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0321, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0289, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0215, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0237, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0279, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0229, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0282, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0218, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0232, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0447, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0374, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0387, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0289, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0194, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0208, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0348, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0271, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0296, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0233, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0150, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0237, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0386, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0265, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0310, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0197, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0246, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0229, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0248, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0213, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0346, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0363, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0431, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0376, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0201, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0225, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0339, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0456, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0349, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0195, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0325, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0193, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0178, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0155, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0315, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0297, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0207, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0278, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0204, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0287, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0188, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0158, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0241, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0305, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0269, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0265, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0309, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0217, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0241, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0290, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0437, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0276, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0343, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0162, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0215, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0207, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0249, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0374, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0254, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0352, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0380, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0293, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0196, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0360, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0206, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0149, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0294, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0232, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0277, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0234, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0258, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0279, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0393, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0305, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0244, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0132, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0294, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0370, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0326, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0294, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0392, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0303, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0237, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0209, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0284, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0326, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0276, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0246, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0337, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0346, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0218, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0214, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0270, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0200, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0230, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0324, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0293, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0264, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0263, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0277, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0217, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0359, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0256, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0211, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0329, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0377, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0344, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0318, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0239, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0320, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0303, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0347, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0270, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0240, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0353, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0284, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0316, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0222, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0192, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0254, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0256, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0179, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0290, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0500, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0344, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0261, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0202, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0232, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0235, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0192, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0287, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0280, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0198, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0265, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0328, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0277, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0287, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0266, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0252, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0361, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0206, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0426, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0181, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0261, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0220, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0321, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0397, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0339, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0105, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0203, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0284, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0256, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0316, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0252, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0300, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0307, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0163, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0307, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0236, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0313, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0288, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0270, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0411, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0201, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0206, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0212, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0242, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0263, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0246, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0253, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0210, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0321, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0306, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0337, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0429, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0259, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0203, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0297, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0281, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0310, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0235, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0336, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0313, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0157, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0263, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0218, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0449, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0336, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0171, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0313, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0318, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0266, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0286, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0203, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0378, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0306, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0373, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0393, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0268, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0300, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0448, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0243, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0269, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0396, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0229, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0345, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0362, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0464, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0203, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0329, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0188, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0366, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0294, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0191, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0377, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0195, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0290, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0278, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0284, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0166, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0219, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0200, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0189, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0154, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0226, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0289, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0268, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0319, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0260, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0296, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0310, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0297, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0265, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0209, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0215, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0273, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0302, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0283, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0330, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0278, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0400, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0225, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0212, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0268, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0215, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0247, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0182, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0248, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0343, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0228, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0161, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0254, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0435, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0236, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0375, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0188, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0251, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0206, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0224, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0250, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0285, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0234, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0293, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0271, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0406, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0228, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0203, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0338, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0198, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0237, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0437, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0179, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0211, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0197, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0304, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0260, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0322, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0235, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0136, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0240, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0328, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0248, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0229, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0200, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0241, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0265, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0281, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0191, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0216, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0343, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0259, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0228, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0184, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0357, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0189, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0375, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0290, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0323, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0124, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0172, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0259, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0250, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0297, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0306, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0343, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0216, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0261, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0255, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0261, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0340, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0287, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0217, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0202, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0396, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0285, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0304, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0304, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0357, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0498, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0255, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0275, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0239, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0330, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0238, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0276, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0395, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0346, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0196, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0297, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0207, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0270, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0310, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0287, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0281, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0292, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0275, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0357, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0253, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0329, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0326, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0269, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0214, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0179, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0329, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0215, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0290, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0396, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0439, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0281, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0232, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0308, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0426, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0282, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0213, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0421, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0179, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0263, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0250, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0245, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0300, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0261, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0239, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0249, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0193, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0353, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0458, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0325, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0327, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0380, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0181, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0410, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0275, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0229, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0261, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0288, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0191, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0242, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0266, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0362, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0352, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0258, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0307, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0321, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0206, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0270, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0215, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0232, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0274, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0327, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0208, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0350, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0382, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0295, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0322, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0383, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0269, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0299, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0344, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0185, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0253, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0279, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0220, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0248, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0223, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0265, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0321, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0183, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0322, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0317, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0273, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0229, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0482, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0258, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0317, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0244, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0296, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0339, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0279, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0339, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0241, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0286, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0322, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0326, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0311, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0361, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0233, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0280, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0186, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0260, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0332, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0224, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0263, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0243, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0193, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0203, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0261, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0318, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0363, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0398, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0280, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0308, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0167, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0325, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0272, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0294, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0335, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0224, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0351, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0270, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0322, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0230, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0170, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0340, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0168, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0305, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0268, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0209, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0184, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0248, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0206, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0370, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0328, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0240, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0284, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0243, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0107, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0233, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0275, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0306, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0273, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0198, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0342, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0189, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0229, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0299, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0250, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0470, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0167, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0160, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0292, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0249, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0273, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0212, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0336, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0302, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0377, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0158, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0337, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0321, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0263, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0454, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0279, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0201, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0197, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0276, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0345, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0178, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0221, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0294, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0376, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0309, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0165, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0264, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0360, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0252, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0181, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0283, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0179, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0277, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0230, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0341, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0201, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0284, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0293, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0355, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0215, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0402, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0227, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0216, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0244, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0217, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0280, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0294, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0259, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0312, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0286, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0279, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0228, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0192, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0288, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0379, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0134, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0326, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0188, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0207, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0232, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0327, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0246, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0363, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0354, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0300, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0436, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0214, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0214, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0463, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0163, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0232, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0206, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0246, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0242, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0251, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0214, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0281, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0211, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0271, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0358, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0400, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0308, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0463, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0258, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0217, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0268, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0228, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0175, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0243, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0395, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0248, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0249, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0384, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0271, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0133, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0414, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0220, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0299, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0395, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0202, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0224, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0225, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0226, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0236, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0424, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0326, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0389, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0326, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0262, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0231, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0115, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0248, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0229, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0328, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0275, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0211, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0199, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0263, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0168, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0335, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0327, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0171, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0255, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0265, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0265, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0298, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0364, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0269, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0267, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0301, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0299, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0317, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0258, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0289, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0232, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0223, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0292, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0241, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0407, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0280, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0305, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0305, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0331, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0299, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0284, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0247, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0267, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0189, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0390, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0259, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0321, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0354, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0376, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0258, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0556, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0182, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0252, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0270, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0350, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0347, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0407, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0224, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0162, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0349, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0310, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0308, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0298, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0166, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0332, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0250, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0422, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0183, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0173, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0253, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0426, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0283, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0252, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0294, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0217, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0193, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0158, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0204, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0291, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0342, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0403, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0223, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0404, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0438, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0122, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0371, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0309, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0255, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0338, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0298, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0305, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0316, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0256, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0275, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0134, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0402, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0294, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0344, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0238, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0279, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0293, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0221, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0184, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0295, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0242, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0487, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0130, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0366, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0375, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0168, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0380, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0193, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0314, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0306, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0257, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0336, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0336, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0324, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0307, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0202, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0178, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0456, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0265, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0149, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0235, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0148, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0171, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0227, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0312, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0160, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0251, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0375, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0280, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0235, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0266, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0243, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0267, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0199, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0431, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0222, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0235, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0287, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0246, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0403, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0200, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0239, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0378, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0333, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0178, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0372, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0235, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0377, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0290, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0254, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0310, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0120, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0355, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0365, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0144, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0294, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0234, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0242, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0289, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0169, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0265, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0309, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0247, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0322, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0316, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0209, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0301, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0297, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0341, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0227, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0361, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0197, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0168, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0386, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0232, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0333, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0265, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0272, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0539, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0390, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0473, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0278, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0318, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0207, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0265, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0357, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0289, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0412, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0226, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0410, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0344, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0275, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0274, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0346, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0339, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0289, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0249, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0208, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0274, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0293, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0235, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0286, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0203, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0250, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0198, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0192, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0267, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0334, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0215, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0297, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0181, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0200, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0230, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0299, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0154, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0239, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0262, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0330, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0273, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0292, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0384, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0407, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0307, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0248, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0199, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0374, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0237, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0289, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0328, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0401, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0257, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0221, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0276, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0385, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0170, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0237, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0238, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0300, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0301, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0203, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0347, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0255, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0339, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0196, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0264, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0340, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0261, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0249, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0232, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0181, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0309, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0264, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0175, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0355, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0358, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0247, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0415, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0216, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0287, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0264, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0302, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0240, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0214, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0339, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0287, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0364, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0196, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0231, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0289, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0264, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0200, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0245, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0304, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0149, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0240, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0261, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0344, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0315, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0204, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0485, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0291, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0199, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0383, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0269, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0288, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0322, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0169, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0437, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0286, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0272, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0190, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0186, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0297, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0267, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0274, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0384, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0186, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0239, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0318, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0343, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0370, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0284, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0380, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0294, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0205, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0350, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0344, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0305, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0358, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0335, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0254, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0268, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0327, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0356, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0301, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0403, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0327, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0187, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0293, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0194, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0229, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0189, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0319, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0153, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0339, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0261, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0380, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0383, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0304, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0273, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0195, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0241, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0190, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0253, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0260, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0223, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0323, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0230, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0270, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0143, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0448, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0245, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0362, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0345, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0547, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0307, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0299, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0340, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0198, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0163, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0241, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0495, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0274, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0197, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0213, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0210, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0207, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0388, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0181, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0251, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0241, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0241, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0196, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0264, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0256, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0245, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0330, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0212, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0334, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0342, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0248, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0347, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0244, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0339, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0211, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0217, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0437, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0182, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0228, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0281, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0365, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0224, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0205, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0304, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0542, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0261, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0246, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0267, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0337, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0310, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0262, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0173, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0267, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0188, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0229, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0274, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0318, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0317, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0220, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0282, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0219, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0137, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0215, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0235, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0285, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0298, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0188, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0237, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0303, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0294, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0296, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0315, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0260, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0238, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0182, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0236, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0429, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0228, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0315, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0425, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0181, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0259, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0261, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0205, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0437, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0218, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0188, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0399, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0282, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0324, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0109, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0178, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0192, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0411, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0283, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0299, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0204, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0328, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0294, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0300, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0347, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0269, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0178, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0339, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0265, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0179, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0259, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0260, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0360, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0240, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0257, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0332, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0200, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0166, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0357, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0189, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0282, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0264, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0165, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0238, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0230, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0313, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0266, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0195, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0347, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0370, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0305, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0301, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0266, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0244, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0199, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0180, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0243, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0231, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0277, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0161, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0217, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0284, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0219, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0265, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0249, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0257, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0351, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0167, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0230, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0186, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0262, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0236, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0231, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0353, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0149, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0242, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0304, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0187, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0271, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0425, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0323, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0171, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0206, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0233, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0255, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0154, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0298, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0170, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0218, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0248, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0149, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0235, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0304, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0308, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0290, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0323, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0260, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0358, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0256, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0202, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0250, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0194, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0243, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0253, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0237, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0266, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0250, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0205, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0264, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0189, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0422, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0295, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0246, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0237, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0373, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0170, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0566, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0123, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0190, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0172, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0136, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0249, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0175, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0239, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0221, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0138, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0283, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0363, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0243, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0254, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0552, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0273, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0170, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0193, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0256, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0282, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0172, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0276, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0256, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0268, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0314, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0247, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0280, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0300, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0341, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0334, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0169, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0305, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0262, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0274, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0166, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0240, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0328, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0181, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0197, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0264, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0170, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0341, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0155, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0196, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0265, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0197, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0291, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0218, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0363, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0212, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0382, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0395, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0155, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0305, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0196, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0222, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0191, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0211, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0235, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0263, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0245, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0208, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0446, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0141, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0229, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0301, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0271, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0338, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0451, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0241, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0190, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0198, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0180, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0334, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0350, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0215, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0180, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0282, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0302, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0254, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0303, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0350, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0233, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0339, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0338, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0162, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0330, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0160, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0208, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0300, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0292, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0207, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0224, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0367, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0234, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0170, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0179, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0208, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0210, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0264, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0197, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0197, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0384, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0232, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0238, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0208, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0201, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0183, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0186, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0373, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0233, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0216, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0215, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0470, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0249, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0191, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0285, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0262, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0275, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0396, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0415, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0234, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0226, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0302, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0309, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0179, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0228, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0174, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0242, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0165, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0288, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0211, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0213, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0294, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0180, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0288, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0272, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0363, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0334, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0253, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0293, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0299, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0171, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0127, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0247, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0122, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0231, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0210, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0257, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0397, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0502, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0300, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0288, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0286, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0319, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0203, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0267, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0178, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0173, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0186, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0172, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0238, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0252, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0451, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0231, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0245, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0224, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0137, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0473, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0231, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0311, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0171, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0421, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0310, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0297, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0299, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0305, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0307, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0203, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0330, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0217, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0263, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0293, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0281, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0255, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0250, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0178, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0343, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0241, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0131, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0281, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0269, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0145, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0279, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0287, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0251, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0376, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0318, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0090, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0326, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0285, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0307, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0215, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0260, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0333, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0232, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0226, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0185, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0374, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0172, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0236, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0253, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0319, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0209, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0284, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0272, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0200, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0244, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0301, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0193, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0200, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0365, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0248, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0355, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0370, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0400, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0216, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0198, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0168, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0254, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0283, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0355, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0253, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0243, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0304, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0369, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0254, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0197, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0354, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0295, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0166, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0215, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0296, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0321, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0202, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0303, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0197, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0249, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0267, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0242, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0257, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0430, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0200, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0265, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0273, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0307, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0283, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0151, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0224, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0218, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0209, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0196, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0258, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0427, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0308, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0250, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0389, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0159, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0244, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0232, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0187, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0198, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0199, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0129, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0304, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0372, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0234, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0278, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0218, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0219, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0207, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0250, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0183, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0187, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0241, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0323, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0274, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0167, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0270, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0375, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0293, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0229, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0313, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0339, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0260, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0235, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0194, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0223, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0329, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0208, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0377, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0493, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0327, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0230, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0221, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0246, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0335, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0230, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0206, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0263, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0162, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0375, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0324, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0329, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0197, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0175, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0224, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0307, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0287, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0203, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0257, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0319, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0218, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0176, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0204, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0383, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0238, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0243, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0164, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0232, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0162, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0216, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0284, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0303, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0149, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0308, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0192, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0273, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0283, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0306, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0205, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0305, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0259, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0292, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0197, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0244, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0264, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0237, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0232, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0333, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0286, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0281, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0291, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0156, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0217, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0159, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0290, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0216, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0243, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0161, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0197, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0231, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0261, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0252, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0183, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0267, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0206, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0264, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0231, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0263, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0434, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0375, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0274, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0133, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0225, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0171, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0281, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0269, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0254, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0269, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0223, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0453, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0269, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0245, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0216, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0243, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0292, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0317, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0228, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0290, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0201, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0214, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0144, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0257, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0301, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0430, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0315, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0266, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0193, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0296, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0364, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0281, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0212, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0237, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0152, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0221, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0255, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0351, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0199, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0320, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0240, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0260, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0265, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0286, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0259, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0268, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0131, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0349, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0167, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0476, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0307, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0153, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0188, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0256, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0242, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0184, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0259, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0380, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0202, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0362, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0367, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0245, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0237, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0307, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0293, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0160, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0291, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0317, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0306, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0354, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0217, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0204, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0267, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0237, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0331, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0366, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0216, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0345, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0361, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0189, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0322, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0399, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0332, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0285, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0258, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0407, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0164, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0247, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0240, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0213, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0211, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0347, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0257, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0388, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0344, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0311, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0221, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0232, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0354, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0365, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0300, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0347, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0224, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0178, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0352, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0245, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0189, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0345, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0190, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0425, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0341, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0239, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0348, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0237, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0284, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0190, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0363, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0314, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0317, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0245, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0265, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0220, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0221, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0240, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0275, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0316, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0216, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0161, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0138, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0222, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0362, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0148, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0278, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0313, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0255, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0260, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0242, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0179, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0108, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0165, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0194, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0245, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0297, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0254, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0288, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0138, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0232, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0390, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0246, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0254, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0254, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0211, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0303, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0340, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0303, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0257, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0191, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0146, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0209, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0190, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0250, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0168, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0228, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0187, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0170, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0439, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0192, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0385, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0387, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0349, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0236, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0257, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0193, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0297, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0265, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0217, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0220, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0285, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0383, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0244, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0271, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0378, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0129, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0259, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0282, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0123, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0256, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0254, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0216, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0245, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0122, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0219, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0236, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0409, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0457, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0439, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0187, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0261, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0169, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0129, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0316, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0321, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0257, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0227, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0351, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0215, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0227, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0374, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0190, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0270, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0286, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0243, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0146, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0261, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0169, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0291, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0337, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0177, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0284, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0264, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0145, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0170, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0420, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0394, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0386, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0241, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0397, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0373, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0391, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0269, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0247, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0301, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0148, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0167, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0269, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0233, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0291, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0215, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0269, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0295, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0185, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0282, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0313, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0283, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0212, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0331, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0283, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0199, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0121, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0243, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0174, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0203, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0232, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0297, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0292, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0216, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0325, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0232, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0213, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0224, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0293, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0271, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0157, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0273, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0313, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0248, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0308, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0154, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0218, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0252, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0373, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0282, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0159, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0280, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0229, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0216, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0245, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0248, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0374, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0260, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0194, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0263, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0499, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0224, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0220, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0204, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0354, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0200, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0292, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0239, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0171, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0169, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0279, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0293, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0236, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0290, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0270, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0227, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0226, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0333, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0235, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0179, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0309, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0195, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0296, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0227, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0379, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0301, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0186, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0349, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0112, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0179, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0318, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0303, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0282, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0332, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0248, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0181, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0277, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0271, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0315, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0237, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0247, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0280, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0357, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0284, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0277, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0322, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0180, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0267, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0291, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0165, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0351, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0165, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0172, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0204, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0265, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0216, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0277, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0333, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0362, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0306, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0193, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0284, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0382, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0304, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0213, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0427, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0334, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0250, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0291, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0212, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0139, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0240, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0174, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0298, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0224, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0235, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0365, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0241, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0295, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0182, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0317, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0169, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0128, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0332, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0157, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0212, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0306, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0221, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0258, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0238, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0325, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0380, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0236, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0232, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0234, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0283, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0265, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0183, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0267, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0120, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0337, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0294, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0157, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0341, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0270, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0230, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0295, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0205, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0291, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0214, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0359, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0250, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0291, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0187, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0251, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0247, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0181, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0172, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0225, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0494, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0285, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0227, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0192, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0162, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0336, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0170, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0340, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0314, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0338, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0215, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0221, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0186, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0206, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0324, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0362, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0191, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0258, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0331, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0204, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0278, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0297, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0212, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0297, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0240, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0287, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0179, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0189, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0197, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0160, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0233, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0214, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0219, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0257, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0293, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0184, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0299, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0184, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0243, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0195, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0270, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0100, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0632, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0169, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0175, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0202, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0196, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0233, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0213, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0314, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0198, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0270, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0424, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0419, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0286, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0224, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0201, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0223, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0253, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0279, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0214, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0264, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0228, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0210, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0187, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0368, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0317, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0199, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0251, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0176, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0350, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0314, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0289, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0258, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0243, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0140, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0275, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0369, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0323, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0352, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0207, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0223, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0271, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0209, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0146, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0210, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0390, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0200, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0352, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0267, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0226, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0168, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0269, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0210, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0148, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0337, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0153, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0287, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0379, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0207, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0353, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0223, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0293, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0228, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0244, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0283, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0154, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0168, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0091, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0218, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0333, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0329, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0519, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0313, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0321, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0283, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0268, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0252, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0241, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0382, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0298, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0322, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0233, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0208, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0338, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0154, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0204, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0236, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0235, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0183, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0193, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0227, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0314, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0147, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0321, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0287, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0207, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0208, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0197, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0102, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0138, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0303, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0244, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0319, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0204, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0384, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0184, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0428, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0112, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0200, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0284, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0237, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0167, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0278, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0201, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0185, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0190, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0140, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0140, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0307, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0326, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0207, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0200, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0268, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0253, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0230, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0273, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0196, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0275, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0183, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0241, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0238, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0342, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0171, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0238, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0164, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0268, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0291, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0244, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0199, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0355, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0222, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0204, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0298, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0194, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0331, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0149, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0232, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0261, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0174, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0305, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0270, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0177, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0253, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0279, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0224, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0180, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0224, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0193, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0314, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0198, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0316, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0218, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0214, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0173, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0302, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0257, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0224, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0196, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0413, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0200, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0452, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0395, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0254, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0307, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0293, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0251, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0239, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0163, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0128, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0276, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0224, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0231, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0247, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0271, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0182, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0217, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0294, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0343, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0184, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0264, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0357, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0163, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0161, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0252, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0166, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0215, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0254, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0147, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0199, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0216, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0344, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0228, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0269, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0192, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0332, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0240, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0206, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0399, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0177, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0238, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0252, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0314, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0293, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0180, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0252, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0115, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0178, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0200, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0279, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0244, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0167, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0264, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0297, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0255, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0415, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0358, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0306, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0402, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0320, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0318, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0234, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0322, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0166, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0286, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0194, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0227, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0142, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0218, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0185, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0243, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0210, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0308, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0318, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0245, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0285, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0335, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0288, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0280, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0264, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0251, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0472, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0201, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0229, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0239, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0169, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0173, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0269, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0237, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0228, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0361, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0358, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0277, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0225, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0474, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0211, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0293, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0218, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0259, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0337, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0233, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0176, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0260, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0322, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0140, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0301, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0347, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0457, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0322, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0261, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0366, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0201, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0295, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0183, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0304, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0175, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0243, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0388, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0287, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0276, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0375, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0255, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0209, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0172, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0259, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0259, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0177, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0218, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0301, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0389, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0242, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0344, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0170, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0302, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0263, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0189, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0191, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0234, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0354, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0387, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0194, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0237, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0216, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0302, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0223, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0203, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0141, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0201, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0126, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0252, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0306, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0241, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0253, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0212, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0143, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0242, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0252, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0186, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0244, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0240, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0327, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0105, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0292, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0311, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0288, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0214, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0328, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0278, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0289, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0229, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0250, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0204, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0187, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0360, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0192, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0313, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0270, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0169, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0285, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0216, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0210, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0349, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0134, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0310, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0343, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0398, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0209, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0219, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0371, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0172, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0398, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0209, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0219, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0287, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0185, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0215, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0241, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0240, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0232, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0387, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0303, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0210, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0231, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0196, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0327, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0262, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0387, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0128, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0385, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0245, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0175, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0334, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0255, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0130, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0171, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0151, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0278, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0233, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0236, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0188, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0311, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0218, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0144, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0290, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0298, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0415, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0148, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0111, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0300, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0197, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0293, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0268, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0332, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0391, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0320, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0185, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0421, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0277, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0329, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0242, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0277, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0095, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0255, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0202, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0232, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0296, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0310, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0252, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0228, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0374, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0455, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0272, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0268, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0322, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0165, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0233, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0265, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0229, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0191, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0398, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0268, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0346, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0268, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0306, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0366, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0319, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0252, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0276, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0176, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0222, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0280, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0190, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0181, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0149, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0149, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0227, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0182, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0314, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0187, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0307, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0533, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0212, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0288, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0430, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0314, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0206, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0340, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0415, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0431, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0233, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0224, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0214, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0229, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0162, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0184, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0277, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0344, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0382, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0339, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0230, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0252, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0197, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0365, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0300, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0176, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0199, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0218, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0125, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0198, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0160, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0206, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0241, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0241, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0286, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0334, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0235, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0220, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0293, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0387, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0240, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0296, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0253, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0207, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0230, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0246, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0185, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0447, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0290, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0347, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0264, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0236, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0333, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0350, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0247, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0322, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0286, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0333, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0285, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0261, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0215, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0318, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0332, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0160, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0201, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0169, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0204, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0221, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0294, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0344, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0201, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0386, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0247, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0263, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0318, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0201, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0315, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0338, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0296, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0186, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0203, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0213, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0177, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0241, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0153, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0322, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0430, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0343, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0296, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0140, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0305, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0371, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0299, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0526, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0254, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0376, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0212, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0285, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0185, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0242, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0172, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0198, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0259, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0332, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0319, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0195, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0350, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0268, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0121, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0239, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0233, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0218, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0166, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0223, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0177, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0171, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0119, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0260, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0209, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0245, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0209, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0202, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0304, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0318, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0134, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0276, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0232, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0184, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0266, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0286, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0167, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0186, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0203, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0254, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0330, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0257, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0224, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0271, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0183, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0303, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0307, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0327, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0366, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0245, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0218, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0147, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0371, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0417, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0267, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0108, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0262, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0169, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0366, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0206, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0342, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0295, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0342, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0237, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0244, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0154, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0413, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0246, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0253, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0271, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0408, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0331, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0297, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0254, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0376, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0395, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0165, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0366, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0358, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0213, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0191, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0274, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0179, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0140, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0125, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0164, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0236, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0272, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0143, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0199, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0117, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0232, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0266, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0195, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0232, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0326, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0223, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0151, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0242, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0271, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0154, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0279, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0312, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0224, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0231, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0224, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0348, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0169, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0291, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0210, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0321, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0218, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0319, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0293, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0223, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0352, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0244, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0194, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0256, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0357, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0265, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0105, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0141, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0256, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0194, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0271, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0198, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0200, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0231, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0464, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0151, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0307, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0246, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0236, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0327, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0329, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0213, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0328, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0431, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0166, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0284, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0224, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0211, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0303, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0187, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0311, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0198, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0269, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0238, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0174, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0302, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0121, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0230, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0361, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0392, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0235, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0178, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0160, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0179, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0274, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0182, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0193, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0171, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0250, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0103, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0241, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0466, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0190, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0236, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0201, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0292, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0160, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0194, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0274, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0308, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0210, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0245, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0271, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0211, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0292, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0123, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0422, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0300, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0277, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0254, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0240, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0267, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0263, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0206, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0219, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0281, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0150, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0217, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0188, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0187, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0225, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0195, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0288, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0163, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0295, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0227, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0271, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0221, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0270, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0263, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0203, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0168, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0226, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0138, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0458, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0315, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0366, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0206, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0174, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0147, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0266, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0302, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0275, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0257, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0313, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0218, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0154, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0191, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0186, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0172, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0180, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0266, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0235, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0340, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0347, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0300, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0358, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0259, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0269, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0300, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0152, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0140, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0296, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0243, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0285, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0327, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0425, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0339, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0205, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0294, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0323, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0275, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0193, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0238, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0306, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0350, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0201, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0158, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0170, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0154, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0245, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0243, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0157, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0240, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0218, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0242, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0187, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0235, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0296, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0261, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0328, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0257, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0147, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0199, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0279, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0278, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0233, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0168, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0272, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0301, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0276, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0206, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0342, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0158, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0178, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0265, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0208, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0283, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0315, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0173, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0342, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0161, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0443, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0224, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0164, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0151, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0231, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0188, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0321, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0388, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0366, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0235, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0267, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0205, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0318, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0147, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0203, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0231, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0254, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0200, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0177, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0251, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0332, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0255, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0274, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0363, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0246, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0217, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0352, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0280, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0436, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0150, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0369, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0205, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0323, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0146, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0196, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0260, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0195, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0300, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0237, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0182, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0240, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0335, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0365, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0271, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0294, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0413, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0285, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0249, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0183, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0229, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0248, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0169, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0152, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0165, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0221, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0253, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0177, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0197, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0369, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0125, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0234, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0224, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0155, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0163, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0349, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0224, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0253, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0148, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0205, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0121, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0118, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0257, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0165, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0198, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0210, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0273, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0185, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0208, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0233, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0279, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0323, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0131, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0169, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0201, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0193, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0193, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0224, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0234, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0304, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0186, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0279, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0328, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0195, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0168, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0193, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0345, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0288, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0237, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0244, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0260, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0138, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0191, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0324, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0220, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0227, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0127, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0281, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0219, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0238, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0262, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0254, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0312, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0327, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0357, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0273, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0302, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0329, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0371, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0116, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0121, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0150, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0150, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0332, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0368, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0371, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0204, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0336, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0350, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0412, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0233, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0336, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0289, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0114, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0090, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0156, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0097, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0354, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0315, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0186, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0112, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0201, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0194, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0244, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0194, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0329, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0184, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0430, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0290, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0202, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0238, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0338, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0232, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0155, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0168, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0196, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0227, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0180, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0281, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0150, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0300, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0235, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0212, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0218, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0315, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0340, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0141, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0368, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0181, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0254, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0179, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0361, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0173, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0174, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0339, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0207, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0217, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0270, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0303, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0167, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0237, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0227, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0416, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0153, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0171, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0226, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0336, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0248, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0187, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0247, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0347, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0154, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0266, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0233, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0213, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0230, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0170, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0180, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0178, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0261, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0289, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0284, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0237, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0461, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0321, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0340, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0310, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0400, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0270, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0149, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0190, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0226, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0219, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0201, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0304, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0280, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0217, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0315, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0202, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0277, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0227, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0254, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0300, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0279, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0239, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0278, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0345, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0137, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0270, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0137, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0248, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0200, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0141, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0379, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0375, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0318, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0501, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0371, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0202, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0216, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0246, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0272, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0176, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0227, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0093, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0134, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0151, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0206, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0326, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0162, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0313, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0275, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0304, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0237, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0314, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0303, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0250, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0264, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0184, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0207, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0273, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0143, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0248, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0132, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0147, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0320, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0173, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0128, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0322, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0271, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0368, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0277, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0265, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0211, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0225, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0262, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0294, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0224, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0210, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0235, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0416, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0426, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0232, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0312, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0228, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0373, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0209, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0238, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0286, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0319, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0196, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0195, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0275, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0210, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0102, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0305, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0316, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0266, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0307, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0484, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0214, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0208, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0252, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0340, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0273, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0372, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0289, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0156, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0197, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0256, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0363, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0135, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0202, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0192, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0157, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0153, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0209, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0161, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0340, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0257, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0292, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0246, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0248, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0429, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0173, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0148, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0295, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0151, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0343, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0148, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0219, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0213, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0143, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0244, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0165, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0246, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0176, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0168, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0300, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0308, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0277, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0293, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0266, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0302, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0357, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0257, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0265, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0225, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0235, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0213, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0160, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0299, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0258, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0195, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0198, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0312, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0197, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0329, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0258, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0296, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0141, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0179, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0239, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0111, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0308, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0260, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0178, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0229, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0282, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0408, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0187, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0347, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0367, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0243, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0203, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0306, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0366, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0288, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0210, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0265, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0239, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0373, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0195, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0292, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0276, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0207, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0204, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0089, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0288, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0237, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0176, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0233, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0334, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0201, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0183, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0223, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0146, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0206, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0182, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0235, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0538, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0194, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0223, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0316, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0229, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0297, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0174, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0260, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0275, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0214, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0306, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0317, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0285, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0308, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0288, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0359, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0450, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0179, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0176, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0239, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0128, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0182, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0320, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0200, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0311, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0295, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0175, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0231, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0221, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0256, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0369, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0169, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0280, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0204, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0229, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0126, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0218, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0225, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0257, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0205, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0160, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0182, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0114, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0160, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0489, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0177, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0193, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0514, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0252, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0329, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0344, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0338, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0277, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0260, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0194, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0423, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0166, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0218, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0166, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0227, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0139, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0227, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0338, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0258, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0555, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0356, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0355, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0289, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0248, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0266, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0164, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0221, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0145, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0098, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0138, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0136, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0244, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0170, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0184, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0209, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0172, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0397, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0143, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0236, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0243, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0192, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0297, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0240, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0260, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0196, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0219, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0269, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0309, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0304, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0268, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0260, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0179, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0325, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0168, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0423, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0302, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0247, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0267, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0332, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0335, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0238, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0313, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0236, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0250, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0232, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0319, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0340, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0281, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0147, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0245, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0333, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0172, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0131, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0335, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0335, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0236, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0175, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0208, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0374, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0220, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0123, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0336, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0201, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0292, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0150, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0268, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0282, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0237, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0191, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0434, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0209, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0340, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0270, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0241, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0140, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0252, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0126, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0245, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0147, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0218, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0184, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0195, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0240, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0392, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0331, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0265, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0210, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0316, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0188, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0293, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0239, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0400, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0305, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0312, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0279, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0259, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0171, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0210, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0127, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0215, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0325, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0163, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0249, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0222, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0121, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0291, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0152, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0223, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0245, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0211, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0380, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0168, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0287, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0264, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0283, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0215, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0168, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0162, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0208, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0269, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0288, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0299, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0174, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0144, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0210, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0360, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0440, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0249, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0162, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0352, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0197, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0248, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0300, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0191, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0133, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0262, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0255, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0227, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0225, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0153, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0291, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0264, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0229, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0374, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0251, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0382, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0312, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0292, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0280, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0236, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0214, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0330, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0308, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0366, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0260, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0270, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0168, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0336, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0178, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0339, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0122, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0264, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0234, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0206, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0232, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0316, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0288, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0281, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0161, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0199, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0182, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0151, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0303, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0345, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0112, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0204, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0219, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0274, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0372, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0217, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0209, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0206, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0348, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0171, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0324, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0127, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0188, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0187, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0333, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0282, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0135, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0254, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0257, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0398, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0186, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0271, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0295, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0300, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0269, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0405, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0237, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0327, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0261, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0287, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0206, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0270, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0130, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0245, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0213, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0310, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0250, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0258, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0232, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0201, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0169, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0161, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0299, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0293, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0281, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0234, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0111, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0223, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0271, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0215, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0261, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0242, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0281, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0403, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0141, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0189, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0269, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0331, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0296, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0173, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0337, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0324, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0257, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0236, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0408, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0211, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0171, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0124, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0209, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0189, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0203, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0295, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0337, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0157, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0416, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0279, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0235, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0183, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0312, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0294, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0152, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0225, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0263, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0223, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0211, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0259, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0271, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0199, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0345, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0245, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0173, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0220, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0346, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0328, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0318, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0275, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0309, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0182, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0090, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0373, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0170, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0220, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0143, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0248, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0201, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0357, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0350, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0256, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0329, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0132, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0357, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0298, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0164, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0212, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0140, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0220, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0111, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0306, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0242, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0206, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0202, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0246, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0271, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0263, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0308, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0290, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0517, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0219, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0362, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0221, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0268, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0147, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0166, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0395, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0142, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0224, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0287, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0150, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0420, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0252, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0259, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0162, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0193, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0391, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0362, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0237, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0266, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0218, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0347, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0392, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0284, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0414, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0134, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0085, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0213, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0224, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0226, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0246, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0385, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0356, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0289, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0315, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0208, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0199, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0273, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0113, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0367, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0334, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0213, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0298, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0237, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0278, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0187, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0495, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0289, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0334, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0295, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0269, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0298, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0367, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0273, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0188, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0154, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0206, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0155, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0167, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0241, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0304, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0121, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0173, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0277, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0466, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0331, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0223, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0238, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0215, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0230, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0261, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0322, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0397, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0236, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0291, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0275, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0261, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0250, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0218, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0382, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0156, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0272, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0195, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0130, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0240, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0276, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0288, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0273, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0264, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0311, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0210, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0211, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0292, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0239, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0265, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0274, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0189, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0286, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0295, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0240, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0364, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0269, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0208, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0246, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0363, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0266, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0211, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0284, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0219, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0311, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0233, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0673, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0241, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0201, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0099, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0168, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0318, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0454, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0223, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0331, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0194, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0259, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0405, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0247, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0371, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0228, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0269, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0471, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0321, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0211, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0358, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0260, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0369, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0228, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0159, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0178, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0132, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0130, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0192, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0235, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0249, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0157, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0217, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0275, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0253, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0284, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0265, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0222, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0196, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0188, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0212, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0179, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0187, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0199, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0310, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0298, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0185, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0190, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0302, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0265, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0367, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0351, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0290, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0319, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0233, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0273, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0241, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0183, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0160, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0286, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0169, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0227, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0199, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0314, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0268, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0280, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0337, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0382, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0193, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0309, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0337, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0346, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0176, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0585, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0252, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0193, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0199, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0188, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0310, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0208, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0235, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0276, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0252, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0303, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0165, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0327, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0209, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0195, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0302, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0327, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0206, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0381, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0172, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0279, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0266, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0177, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0318, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0166, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0191, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0266, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0227, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0148, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0272, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0251, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0232, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0143, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0249, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0238, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0143, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0416, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0201, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0271, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0262, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0266, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0256, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0185, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0351, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0314, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0198, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0271, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0224, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0284, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0325, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0397, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0332, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0397, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0170, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0236, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0238, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0193, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0309, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0293, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0412, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0270, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0227, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0261, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0200, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0221, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0296, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0218, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0191, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0186, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0131, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0270, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0279, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0179, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0280, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0167, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0326, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0159, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0386, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0338, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0192, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0260, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0279, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0186, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0232, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0222, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0191, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0132, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0133, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0220, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0260, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0165, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0288, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0246, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0328, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0575, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0324, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0471, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0322, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0302, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0354, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0144, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0279, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0274, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0229, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0246, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0237, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0191, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0124, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0236, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0224, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0186, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0291, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0100, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0234, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0258, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0212, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0254, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0252, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0328, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0153, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0305, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0250, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0185, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0180, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0276, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0207, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0137, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0359, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0315, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0304, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0136, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0210, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0213, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0228, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0279, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0182, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0254, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0227, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0246, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0219, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0274, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0306, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0262, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0177, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0195, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0327, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0293, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0168, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0188, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0262, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0285, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0311, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0210, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0338, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0200, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0311, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0247, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0165, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0102, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0164, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0310, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0345, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0161, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0378, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0199, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0344, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0279, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0239, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0335, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0291, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0187, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0274, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0246, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0241, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0244, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0152, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0083, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0274, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0185, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0192, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0354, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0166, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0391, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0329, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0438, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0193, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0155, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0271, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0215, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0129, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0156, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0268, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0199, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0264, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0216, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0190, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0155, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0220, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0362, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0241, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0240, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0299, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0281, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0241, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0347, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0261, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0219, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0295, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0252, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0314, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0298, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0200, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0291, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0287, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0224, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0238, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0239, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0142, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0203, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0295, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0149, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0295, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0156, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0157, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0275, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0325, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0329, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0246, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0425, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0153, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0336, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0289, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0237, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0265, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0266, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0166, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0274, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0328, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0286, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0406, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0142, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0182, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0300, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0240, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0205, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0155, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0166, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0138, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0250, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0230, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0502, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0239, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0136, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0195, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0283, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0336, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0228, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0188, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0198, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0365, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0284, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0298, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0235, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0505, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0246, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0199, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0343, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0169, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0333, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0205, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0149, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0264, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0199, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0228, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0265, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0268, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0219, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0362, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0416, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0299, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0257, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0223, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0259, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0297, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0257, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0315, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0201, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0273, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0270, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0166, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0344, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0214, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0250, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0211, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0157, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0252, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0299, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0290, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0449, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0142, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0239, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0316, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0197, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0157, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0251, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0216, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0227, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0238, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0259, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0215, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0210, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0345, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0353, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0256, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0214, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0299, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0206, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0278, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0207, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0220, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0237, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0100, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0157, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0193, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0165, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0323, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0210, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0294, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0296, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0190, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0211, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0243, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0197, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0358, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0279, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0222, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0198, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0292, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0275, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0285, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0137, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0404, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0093, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0218, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0356, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0296, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0342, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0205, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0289, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0250, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0385, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0288, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0275, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0262, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0219, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0249, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0210, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0336, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0272, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0233, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0299, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0293, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0211, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0293, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0297, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0112, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0190, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0294, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0254, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0206, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0236, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0389, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0210, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0304, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0171, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0152, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0276, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0181, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0229, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0244, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0195, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0264, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0268, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0266, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0232, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0357, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0237, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0222, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0290, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0346, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0264, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0185, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0348, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0175, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0298, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0224, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0165, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0159, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0139, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0221, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0268, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0281, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0368, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0398, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0308, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0172, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0365, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0418, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0574, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0269, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0233, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0247, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0207, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0192, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0213, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0130, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0161, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0173, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0321, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0261, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0147, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0191, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0209, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0314, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0226, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0250, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0306, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0179, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0428, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0166, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0222, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0111, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0284, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0141, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0116, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0131, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0244, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0255, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0199, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0372, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0292, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0229, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0250, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0204, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0278, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0164, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0449, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0206, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0234, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0243, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0221, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0308, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0184, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0218, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0195, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0207, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0321, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0291, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0149, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0168, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0221, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0306, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0270, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0274, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0190, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0255, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0260, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0236, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0318, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0263, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0274, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0275, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0163, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0297, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0188, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0420, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0311, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0192, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0331, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0181, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0310, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0229, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0125, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0274, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0254, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0242, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0156, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0227, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0247, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0187, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0142, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0530, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0153, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0203, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0258, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0280, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0178, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0164, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0258, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0153, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0304, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0366, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0234, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0206, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0345, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0218, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0289, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0141, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0218, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0154, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0191, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0179, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0271, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0271, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0293, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0188, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0252, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0280, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0458, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0179, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0358, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0205, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0189, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0174, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0178, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0105, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0190, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0173, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0312, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0196, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0222, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0223, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0154, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0455, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0245, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0247, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0226, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0274, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0156, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0219, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0167, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0189, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0189, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0151, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0198, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0179, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0364, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0276, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0216, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0328, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0260, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0292, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0208, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0224, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0250, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0288, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0212, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0295, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0287, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0258, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0168, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0130, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0234, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0118, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0307, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0299, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0270, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0162, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0189, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0220, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0355, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0157, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0284, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0196, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0282, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0162, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0193, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0134, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0135, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0328, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0124, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0209, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0119, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0289, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0294, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0143, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0242, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0185, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0108, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0355, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0271, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0358, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0157, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0270, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0286, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0398, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0226, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0288, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0158, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0196, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0352, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0279, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0186, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0456, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0229, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0447, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0206, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0392, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0224, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0333, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0156, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0288, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0252, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0201, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0356, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0303, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0265, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0200, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0199, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0269, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0438, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0183, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0241, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0171, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0091, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0546, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0290, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0193, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0243, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0262, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0226, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0300, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0183, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0248, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0241, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0212, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0094, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0224, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0263, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0187, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0146, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0315, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0360, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0205, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0186, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0292, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0326, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0314, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0241, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0297, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0315, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0300, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0335, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0464, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0261, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0165, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0380, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0296, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0170, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0209, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0211, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0200, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0226, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0314, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0267, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0331, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0244, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0355, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0207, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0226, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0213, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0184, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0335, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0284, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0155, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0126, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0196, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0236, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0179, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0332, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0188, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0259, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0213, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0300, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0416, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0254, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0260, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0227, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0380, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0205, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0242, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0260, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0095, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0223, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0189, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0217, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0409, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0247, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0240, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0396, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0197, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0281, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0288, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0242, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0185, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0246, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0323, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0153, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0318, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0189, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0309, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0244, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0154, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0134, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0207, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0244, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0238, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0305, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0343, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0199, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0222, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0217, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0344, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0223, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0216, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0286, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0251, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0262, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0173, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0197, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0274, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0218, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0285, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0099, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0261, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0264, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0194, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0271, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0272, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0319, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0262, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0216, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0142, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0294, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0394, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0241, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0285, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0202, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0323, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0147, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0259, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0227, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0200, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0158, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0174, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0120, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0253, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0211, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0191, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0334, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0188, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0372, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0239, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0117, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0229, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0246, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0375, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0206, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0183, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0203, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0248, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0233, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0183, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0320, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0214, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0145, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0310, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0182, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0292, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0220, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0479, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0382, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0218, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0241, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0329, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0229, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0229, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0296, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0209, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0354, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0220, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0232, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0245, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0155, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0109, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0189, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0333, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0197, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0258, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0210, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0421, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0381, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0180, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0242, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0205, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0217, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0216, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0224, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0230, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0162, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0272, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0240, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0197, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0235, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0187, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0219, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0263, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0247, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0268, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0278, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0220, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0206, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0322, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0288, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0185, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0352, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0432, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0263, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0262, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0258, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0293, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0126, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0282, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0179, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0406, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0114, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0288, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0281, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0329, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0098, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0271, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0366, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0240, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0293, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0115, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0354, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0320, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0218, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0255, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0264, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0241, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0282, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0290, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0232, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0303, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0190, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0256, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0152, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0170, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0226, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0259, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0147, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0208, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0294, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0261, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0243, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0289, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0157, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0262, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0411, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0194, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0195, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0243, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0246, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0494, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0160, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0308, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0254, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0185, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0174, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0182, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0208, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0227, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0293, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0200, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0234, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0300, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0344, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0225, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0268, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0297, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0378, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0310, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0304, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0316, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0247, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0261, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0196, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0261, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0299, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0226, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0352, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0254, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0198, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0189, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0206, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0170, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0340, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0416, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0207, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0352, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0272, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0246, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0315, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0348, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0206, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0207, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0188, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0212, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0226, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0248, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0226, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0294, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0342, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0260, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0181, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0231, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0319, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0231, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0216, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0250, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0292, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0288, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0169, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0310, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0260, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0320, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0153, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0270, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0307, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0223, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0200, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0171, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0356, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0199, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0459, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0206, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0205, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0210, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0196, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0244, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0186, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0209, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0255, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0162, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0366, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0241, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0140, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0270, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0190, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0173, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0311, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0201, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0311, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0180, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0198, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0273, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0164, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0374, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0350, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0177, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0161, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0175, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0085, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0137, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0181, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0277, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0208, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0198, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0299, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0218, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0189, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0266, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0164, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0278, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0189, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0131, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0250, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0151, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0139, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0161, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0264, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0154, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0185, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0224, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0235, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0267, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0243, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0276, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0244, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0244, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0198, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0230, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0250, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0486, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0313, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0230, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0288, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0141, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0119, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0276, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0237, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0265, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0278, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0369, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0355, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0232, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0147, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0119, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0205, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0381, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0175, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0190, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0202, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0233, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0305, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0371, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0334, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0283, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0225, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0224, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0316, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0220, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0404, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0232, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0331, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0120, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0256, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0216, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0219, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0408, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0205, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0275, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0314, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0159, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0245, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0092, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0192, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0241, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0249, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0118, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0230, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0170, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0243, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0305, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0197, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0144, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0203, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0255, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0216, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0310, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0188, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0337, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0251, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0299, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0095, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0123, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0219, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0359, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0257, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0130, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0304, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0144, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0253, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0141, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0235, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0313, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0137, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0415, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0345, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0145, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0229, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0180, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0291, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0166, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0197, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0231, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0121, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0174, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0251, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0200, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0237, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0163, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0309, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0171, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0300, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0172, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0151, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0103, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0122, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0171, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0185, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0155, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0190, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0170, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0375, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0214, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0424, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0220, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0188, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0262, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0294, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0229, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0320, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0124, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0254, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0203, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0347, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0157, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0211, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0253, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0314, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0149, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0280, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0232, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0477, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0239, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0247, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0282, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0136, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0220, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0250, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0282, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0152, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0094, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0202, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0370, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0219, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0317, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0282, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0224, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0189, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0201, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0376, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0260, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0395, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0198, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0305, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0239, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0258, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0275, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0213, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0169, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0243, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0153, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0193, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0260, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0283, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0274, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0256, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0225, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0286, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0160, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0337, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0323, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0124, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0283, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0377, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0241, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0176, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0336, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0153, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0572, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0116, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0278, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0247, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0202, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0217, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0306, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0127, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0150, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0193, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0212, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0236, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0181, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0251, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0234, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0257, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0277, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0173, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0093, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0123, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0286, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0227, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0274, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0219, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0269, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0168, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0215, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0204, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0359, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0326, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0152, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0219, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0247, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0115, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0283, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0162, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0209, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0199, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0284, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0176, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0265, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0258, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0231, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0273, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0288, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0241, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0225, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0264, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0130, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0262, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0327, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0477, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0149, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0246, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0594, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0119, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0120, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0164, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0321, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0225, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0277, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0259, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0254, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0175, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0212, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0289, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0258, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0353, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0352, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0146, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0364, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0219, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0185, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0322, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0133, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0280, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0231, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0171, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0260, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0159, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0208, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0240, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0201, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0192, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0143, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0187, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0274, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0136, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0207, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0197, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0381, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0149, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0215, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0227, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0218, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0221, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0114, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0263, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0225, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0219, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0335, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0472, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0265, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0154, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0198, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0238, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0111, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0167, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0161, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0298, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0261, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0202, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0107, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0180, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0291, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0194, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0344, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0218, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0132, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0140, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0317, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0438, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0189, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0167, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0207, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0214, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0095, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0231, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0175, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0183, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0224, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0240, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0202, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0267, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0309, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0255, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0145, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0250, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0341, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0209, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0556, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0214, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0133, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0180, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0269, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0206, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0238, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0251, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0266, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0263, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0218, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0209, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0242, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0450, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0160, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0176, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0205, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0157, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0135, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0210, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0379, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0196, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0182, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0176, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0288, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0328, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0194, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0181, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0253, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0252, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0218, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0236, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0269, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0214, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0170, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0151, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0243, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0132, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0334, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0227, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0295, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0352, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0115, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0128, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0155, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0116, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0093, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0133, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0195, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0172, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0358, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0283, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0233, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0287, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0151, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0144, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0355, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0238, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0229, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0230, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0119, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0180, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0131, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0109, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0326, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0316, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0259, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0208, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0181, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0288, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0142, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0246, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0226, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0288, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0185, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0315, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0162, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0179, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0173, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0359, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0240, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0245, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0213, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0145, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0166, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0377, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0193, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0179, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0365, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0355, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0212, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0239, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0211, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0260, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0272, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0134, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0237, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0369, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0161, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0317, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0206, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0298, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0272, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0239, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0219, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0264, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0307, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0288, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0208, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0214, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0161, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0264, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0252, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0459, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0116, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0158, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0319, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0301, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0390, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0227, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0296, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0172, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0352, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0203, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0325, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0238, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0226, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0155, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0207, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0263, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0211, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0258, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0301, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0239, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0261, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0156, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0289, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0182, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0230, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0168, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0126, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0165, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0154, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0283, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0192, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0343, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0136, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0181, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0186, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0200, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0251, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0149, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0115, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0224, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0228, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0276, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0116, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0273, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0172, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0117, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0239, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0214, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0207, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0233, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0144, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0315, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0170, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0493, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0271, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0192, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0204, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0242, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0368, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0164, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0267, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0278, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0250, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0262, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0170, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0259, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0156, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0252, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0284, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0260, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0313, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0296, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0237, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0165, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0194, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0211, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0216, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0194, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0298, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0483, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0196, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0336, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0130, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0245, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0232, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0182, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0166, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0211, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0234, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0191, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0204, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0249, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0278, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0165, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0127, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0235, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0192, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0186, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0230, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0189, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0209, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0252, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0102, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0279, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0277, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0122, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0240, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0365, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0258, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0217, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0285, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0234, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0226, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0188, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0302, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0311, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0420, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0231, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0315, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0240, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0163, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0263, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0321, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0216, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0365, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0278, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0284, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0323, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0193, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0215, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0291, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0256, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0137, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0220, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0126, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0176, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0183, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0128, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0169, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0258, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0315, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0148, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0226, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0142, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0198, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0212, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0339, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0150, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0209, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0387, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0198, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0243, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0258, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0175, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0156, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0130, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0192, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0248, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0145, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0341, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0124, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0203, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0231, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0332, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0149, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0240, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0133, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0236, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0190, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0209, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0107, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0234, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0195, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0248, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0124, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0182, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0358, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0158, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0204, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0207, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0163, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0141, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0253, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0283, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0134, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0342, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0234, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0355, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0166, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0215, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0179, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0210, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0103, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0312, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0318, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0285, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0315, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0214, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0267, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0215, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0254, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0207, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0126, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0229, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0134, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0271, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0216, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0264, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0182, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0214, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0357, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0202, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0177, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0175, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0418, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0162, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0183, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0235, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0247, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0199, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0188, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0176, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0171, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0216, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0177, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0157, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0210, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0229, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0138, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0500, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0335, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0332, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0229, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0124, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0625, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0211, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0257, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0275, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0145, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0232, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0133, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0178, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0112, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0316, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0190, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0203, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0273, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0205, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0200, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0275, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0271, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0222, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0170, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0298, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0308, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0212, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0198, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0175, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0151, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0237, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0184, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0301, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0188, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0198, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0161, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0252, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0263, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0286, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0270, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0318, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0337, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0270, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0262, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0165, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0092, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0126, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0283, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0206, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0222, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0257, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0238, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0292, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0253, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0341, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0233, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0161, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0192, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0220, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0144, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0304, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0167, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0128, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0316, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0159, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0160, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0350, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0269, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0227, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0243, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0245, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0237, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0289, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0243, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0220, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0148, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0141, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0363, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0152, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0206, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0105, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0171, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0085, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0289, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0307, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0156, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0136, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0190, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0195, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0217, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0163, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0235, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0288, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0182, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0171, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0141, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0168, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0131, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0266, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0249, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0147, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0343, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0236, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0304, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0327, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0261, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0228, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0232, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0142, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0382, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0188, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0139, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0151, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0323, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0236, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0157, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0121, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0201, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0176, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0180, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0260, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0246, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0266, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0507, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0277, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0204, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0318, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0419, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0228, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0153, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0196, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0256, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0165, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0203, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0236, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0244, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0171, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0144, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0208, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0267, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0202, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0199, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0234, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0192, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0209, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0288, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0310, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0112, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0206, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0242, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0199, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0208, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0271, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0188, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0139, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0298, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0390, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0263, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0206, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0176, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0189, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0193, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0171, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0136, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0383, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0126, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0212, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0130, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0135, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0197, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0172, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0194, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0238, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0224, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0164, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0261, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0228, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0230, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0184, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0258, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0261, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0257, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0107, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0301, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0257, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0214, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0181, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0198, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0191, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0310, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0183, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0341, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0259, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0287, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0352, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0169, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0147, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0245, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0114, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0148, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0229, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0203, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0132, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0166, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0186, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0138, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0224, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0091, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0167, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0339, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0237, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0145, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0231, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0234, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0208, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0280, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0139, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0297, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0294, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0295, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0266, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0283, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0326, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0175, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0207, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0387, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0194, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0142, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0225, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0207, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0177, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0156, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0249, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0191, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0410, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0199, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0219, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0450, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0260, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0364, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0157, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0186, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0400, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0294, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0164, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0182, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0253, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0216, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0208, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0234, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0107, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0230, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0134, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0153, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0271, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0154, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0345, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0228, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0197, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0330, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0253, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0217, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0242, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0251, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0215, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0131, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0230, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0395, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0300, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0178, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0327, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0189, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0241, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0356, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0329, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0309, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0237, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0519, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0211, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0171, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0241, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0159, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0145, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0195, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0170, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0196, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0270, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0176, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0141, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0144, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0109, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0304, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0170, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0128, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0214, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0209, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0183, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0494, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0156, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0162, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0423, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0261, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0216, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0200, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0313, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0105, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0318, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0171, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0151, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0204, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0247, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0145, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0215, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0303, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0226, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0226, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0140, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0231, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0189, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0170, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0306, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0185, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0262, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0368, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0134, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0330, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0235, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0185, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0158, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0358, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0212, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0282, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0272, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0217, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0233, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0163, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0183, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0216, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0277, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0311, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0162, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0376, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0208, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0126, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0216, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0203, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0133, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0152, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0124, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0171, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0269, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0170, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0223, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0227, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0156, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0301, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0177, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0156, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0267, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0145, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0207, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0250, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0179, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0265, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0134, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0252, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0381, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0118, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0197, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0208, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0153, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0153, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0300, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0178, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0183, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0285, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0142, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0178, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0252, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0184, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0184, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0186, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0158, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0134, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0331, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0139, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0129, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0157, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0203, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0187, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0221, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0249, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0291, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0158, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0335, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0231, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0261, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0196, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0220, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0096, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0144, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0122, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0169, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0495, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0294, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0174, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0164, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0227, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0305, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0269, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0300, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0348, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0174, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0267, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0277, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0245, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0143, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0330, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0422, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0256, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0141, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0131, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0216, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0170, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0467, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0204, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0295, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0185, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0336, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0205, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0172, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0160, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0186, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0210, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0200, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0336, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0200, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0178, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0207, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0180, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0114, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0162, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0213, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0292, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0283, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0198, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0278, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0211, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0163, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0188, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0104, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0241, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0325, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0126, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0281, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0187, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0286, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0205, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0223, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0407, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0359, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0238, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0297, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0171, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0287, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0193, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0120, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0117, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0197, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0272, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0151, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0177, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0252, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0112, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0300, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0159, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0259, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0119, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0167, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0268, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0235, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0161, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0241, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0265, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0229, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0195, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0408, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0185, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0273, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0176, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0144, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0466, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0247, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0263, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0253, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0144, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0104, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0189, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0256, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0163, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0276, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0315, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0154, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0247, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0222, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0151, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0156, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0490, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0147, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0125, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0209, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0267, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0269, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0178, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0192, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0235, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0097, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0281, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0223, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0263, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0222, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0194, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0385, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0231, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0304, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0253, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0210, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0345, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0156, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0286, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0194, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0231, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0157, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0488, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0224, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0523, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0153, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0115, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0095, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0162, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0352, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0237, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0261, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0090, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0364, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0233, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0208, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0101, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0316, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0459, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0231, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0139, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0246, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0142, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0266, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0207, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0214, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0269, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0384, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0229, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0139, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0251, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0207, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0155, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0386, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0292, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0240, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0307, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0186, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0175, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0109, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0093, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0307, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0221, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0148, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0150, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0292, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0246, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0217, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0164, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0456, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0343, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0200, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0202, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0319, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0187, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0136, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0164, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0177, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0215, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0154, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0101, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0148, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0269, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0165, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0234, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0185, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0208, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0123, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0206, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0233, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0103, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0183, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0354, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0117, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0293, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0221, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0086, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0243, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0208, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0170, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0105, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0242, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0160, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0148, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0310, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0416, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0208, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0170, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0115, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0111, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0123, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0144, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0174, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0395, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0166, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0293, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0201, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0221, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0173, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0220, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0278, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0326, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0192, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0263, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0347, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0285, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0189, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0162, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0232, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0208, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0138, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0146, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0192, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0237, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0239, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0317, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0415, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0223, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0251, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0252, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0396, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0261, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0236, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0222, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0251, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0177, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0119, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0219, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0165, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0163, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0243, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0271, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0224, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0268, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0153, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0184, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0153, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0245, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0180, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0184, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0309, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0300, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0303, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0190, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0186, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0082, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0256, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0135, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0219, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0299, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0247, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0247, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0171, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0184, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0104, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0293, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0145, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0267, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0127, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0286, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0185, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0196, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0161, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0341, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0253, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0249, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0130, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0235, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0128, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0131, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0499, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0218, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0152, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0310, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0231, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0230, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0209, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0219, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0221, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0203, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0131, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0168, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0263, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0283, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0182, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0202, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0205, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0140, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0172, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0226, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0221, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0390, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0122, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0260, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0209, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0119, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0206, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0098, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0246, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0100, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0156, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0258, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0236, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0204, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0181, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0108, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0273, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0380, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0229, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0273, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0253, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0241, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0212, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0333, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0167, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0108, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0250, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0227, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0187, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0126, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0187, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0151, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0184, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0214, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0216, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0210, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0165, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0166, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0245, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0290, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0241, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0347, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0138, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0119, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0150, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0204, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0136, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0102, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0162, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0290, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0243, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0208, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0208, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0331, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0241, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0275, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0275, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0293, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0278, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0150, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0269, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0569, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0229, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0225, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0225, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0225, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0211, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0211, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0211, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0190, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0190, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0190, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0165, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0165, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0165, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0139, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0139, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0139, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0113, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0113, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0276, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0135, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0223, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0231, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0284, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0091, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0411, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0172, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0248, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0145, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0263, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0239, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0318, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0196, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0238, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0173, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0345, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0319, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0257, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0315, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0257, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0150, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0349, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0241, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0220, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0148, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0226, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0182, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0244, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0192, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0176, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0371, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0313, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0239, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0231, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0236, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0353, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0165, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0183, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0254, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0286, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0327, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0219, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0208, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0217, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0319, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0285, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0221, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0303, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0157, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0233, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0178, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0227, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0252, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0156, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0173, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0164, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0170, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0192, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0132, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0303, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0241, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0215, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0364, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0209, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0272, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0290, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0237, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0178, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0189, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0138, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0247, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0272, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0275, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0183, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0209, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0261, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0195, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0222, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0517, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0436, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0201, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0233, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0172, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0209, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0153, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0228, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0227, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0295, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0150, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0140, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0179, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0177, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0199, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0166, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0197, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0191, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0205, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0275, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0108, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0146, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0192, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0397, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0264, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0280, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0111, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0207, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0286, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0224, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0140, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0115, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0262, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0140, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0285, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0167, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0185, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0197, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0230, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0175, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0237, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0164, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0259, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0289, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0154, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0202, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0208, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0189, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0218, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0550, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0174, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0220, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0121, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0189, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0151, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0214, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0225, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0231, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0183, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0176, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0162, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0160, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0160, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0326, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0255, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0312, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0325, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0361, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0257, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0163, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0174, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0108, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0252, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0197, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0228, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0278, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0233, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0258, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0131, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0202, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0210, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0203, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0239, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0200, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0171, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0207, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0161, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0256, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0135, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0190, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0163, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0140, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0192, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0261, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0172, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0213, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0180, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0268, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0160, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0165, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0164, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0216, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0249, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0121, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0167, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0124, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0194, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0190, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0292, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0203, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0257, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0221, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0258, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0250, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0233, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0417, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0209, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0197, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0159, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0143, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0238, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0203, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0158, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0144, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0182, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0158, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0236, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0213, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0286, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0177, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0224, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0207, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0225, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0293, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0173, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0179, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0108, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0173, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0298, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0198, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0126, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0198, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0314, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0197, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0160, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0224, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0195, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0208, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0242, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0307, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0178, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0317, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0267, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0109, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0156, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0190, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0315, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0171, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0282, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0253, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0188, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0270, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0276, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0319, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0215, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0350, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0373, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0337, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0179, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0138, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0311, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0362, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0208, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0102, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0172, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0186, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0142, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0230, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0145, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0186, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0188, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0122, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0246, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0137, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0227, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0291, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0140, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0136, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0192, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0083, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0236, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0385, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0262, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0214, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0460, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0337, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0142, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0211, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0357, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0156, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0296, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0291, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0151, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0119, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0338, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0193, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0125, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0190, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0216, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0139, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0382, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0152, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0191, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0221, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0301, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0224, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0149, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0183, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0228, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0368, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0324, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0258, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0201, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0271, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0222, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0112, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0205, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0184, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0211, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0227, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0149, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0173, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0166, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0220, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0329, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0201, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0297, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0285, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0232, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0182, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0251, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0169, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0238, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0184, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0308, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0246, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0152, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0196, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0244, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0155, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0109, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0164, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0144, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0240, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0214, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0178, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0165, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0210, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0232, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0218, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0213, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0147, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0155, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0153, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0230, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0217, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0224, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0131, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0150, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0249, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0368, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0221, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0194, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0144, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0234, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0214, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0121, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0153, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0250, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0292, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0219, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0302, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0190, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0247, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0163, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0303, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0228, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0206, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0362, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0214, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0290, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0111, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0204, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0133, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0218, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0139, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0116, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0114, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0167, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0184, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0196, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0126, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0167, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0218, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0247, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0177, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0266, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0174, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0313, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0218, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0118, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0243, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0204, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0350, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0155, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0240, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0208, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0490, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0103, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0332, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0191, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0184, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0213, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0257, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0268, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0171, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0138, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0236, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0107, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0120, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0336, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0118, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0220, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0161, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0151, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0270, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0162, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0182, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0168, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0409, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0373, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0205, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0226, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0241, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0182, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0193, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0195, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0244, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0277, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0228, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0215, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0349, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0287, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0258, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0178, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0152, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0134, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0309, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0113, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0244, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0197, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0189, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0330, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0298, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0222, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0130, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0226, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0106, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0499, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0260, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0492, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0170, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0115, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0194, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0297, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0260, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0134, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0105, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0195, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0170, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0277, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0102, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0160, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0176, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0272, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0182, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0570, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0378, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0139, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0218, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0297, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0315, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0366, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0190, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0275, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0136, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0144, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0140, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0275, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0231, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0225, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0220, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0185, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0195, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0295, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0188, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0294, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0346, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0146, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0178, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0105, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0145, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0296, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0178, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0191, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0171, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0198, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0235, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0206, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0243, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0289, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0286, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0290, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0148, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0203, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0343, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0210, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0324, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0321, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0139, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0233, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0182, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0199, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0153, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0286, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0312, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0218, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0247, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0320, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0130, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0221, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0253, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0307, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0206, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0143, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0224, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0323, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0144, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0245, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0311, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0258, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0233, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0285, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0367, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0228, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0182, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0119, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0160, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0141, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0265, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0245, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0162, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0187, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0215, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0161, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0168, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0275, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0259, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0120, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0173, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0269, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0292, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0223, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0186, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0222, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0154, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0272, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0285, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0149, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0289, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0207, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0352, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0183, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0198, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0191, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0214, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0200, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0289, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0205, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0178, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0202, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0211, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0162, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0212, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0137, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0393, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0380, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0162, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0103, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0201, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0153, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0185, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0286, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0267, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0192, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0169, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0326, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0192, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0227, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0194, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0157, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0315, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0148, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0275, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0241, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0103, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0258, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0149, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0201, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0180, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0235, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0139, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0199, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0163, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0261, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0234, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0230, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0172, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0201, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0131, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0252, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0246, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0143, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0198, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0151, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0350, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0164, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0137, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0207, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0169, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0281, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0228, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0318, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0313, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0208, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0212, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0312, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0093, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0238, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0140, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0160, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0152, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0330, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0261, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0271, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0152, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0244, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0190, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0136, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0276, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0283, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0261, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0241, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0200, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0122, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0246, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0190, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0166, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0162, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0196, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0190, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0313, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0220, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0174, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0221, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0113, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0138, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0170, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0200, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0197, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0349, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0210, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0242, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0176, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0099, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0114, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0175, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0235, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0413, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0184, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0125, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0123, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0457, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0148, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0282, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0213, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0374, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0121, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0284, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0205, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0290, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0152, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0228, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0256, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0217, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0201, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0157, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0144, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0273, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0219, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0122, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0092, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0276, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0394, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0192, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0176, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0181, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0333, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0237, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0188, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0238, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0127, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0202, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0142, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0220, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0236, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0145, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0134, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0200, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0150, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0181, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0156, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0219, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0215, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0198, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0158, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0266, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0271, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0316, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0386, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0229, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0167, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0206, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0213, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0299, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0132, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0183, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0192, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0317, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0112, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0313, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0209, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0190, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0108, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0280, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0257, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0284, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0158, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0157, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0197, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0134, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0188, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0233, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0239, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0172, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0242, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0256, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0458, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0182, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0150, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0159, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0158, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0278, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0398, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0297, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0207, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0293, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0172, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0191, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0141, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0117, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0226, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0231, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0283, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0148, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0169, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0155, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0326, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0283, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0210, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0142, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0209, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0245, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0148, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0271, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0185, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0175, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0128, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0160, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0110, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0295, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0290, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0263, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0200, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0177, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0210, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0278, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0334, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0254, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0160, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0197, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0204, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0228, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0165, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0108, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0262, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0217, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0226, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0245, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0112, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0136, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0188, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0286, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0252, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0451, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0213, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0154, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0196, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0147, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0149, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0298, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0132, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0153, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0183, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0229, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0194, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0226, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0225, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0186, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0182, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0215, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0190, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0208, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0276, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0130, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0378, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0475, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0299, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0223, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0182, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0173, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0198, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0178, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0220, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0162, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0300, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0199, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0293, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0251, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0228, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0191, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0242, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0335, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0301, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0289, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0259, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0186, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0262, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0182, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0179, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0145, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0133, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0506, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0215, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0181, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0298, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0157, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0193, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0141, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0293, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0182, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0263, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0198, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0331, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0459, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0194, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0571, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0195, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0159, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0185, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0137, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0139, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0089, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0203, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0156, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0187, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0267, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0227, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0157, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0295, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0195, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0247, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0249, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0170, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0123, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0156, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0167, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0105, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0099, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0201, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0418, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0190, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0122, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0321, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0112, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0192, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0186, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0151, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0245, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0211, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0208, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0254, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0175, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0240, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0192, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0158, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0235, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0325, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0187, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0244, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0196, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0225, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0200, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0225, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0206, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0101, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0212, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0250, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0217, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0181, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0331, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0233, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0136, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0148, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0182, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0166, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0143, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0297, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0113, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0286, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0157, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0200, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0188, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0310, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0193, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0140, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0297, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0203, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0286, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0127, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0192, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0188, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0122, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0210, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0206, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0146, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0153, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0231, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0259, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0156, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0125, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0267, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0097, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0268, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0208, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0246, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0130, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0274, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0083, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0130, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0253, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0238, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0187, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0115, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0156, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0380, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0229, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0120, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0173, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0163, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0203, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0281, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0196, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0263, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0186, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0164, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0183, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0123, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0147, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0305, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0195, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0149, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0340, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0189, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0089, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0141, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0199, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0348, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0179, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0255, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0262, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0137, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0158, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0229, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0176, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0099, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0138, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0216, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0230, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0261, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0188, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0248, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0196, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0232, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0190, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0274, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0395, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0189, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0166, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0201, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0150, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0304, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0295, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0188, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0144, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0201, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0225, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0308, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0158, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0401, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0186, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0178, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0165, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0320, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0231, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0180, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0121, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0211, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0258, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0229, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0450, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0204, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0173, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0312, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0277, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0202, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0174, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0119, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0155, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0195, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0223, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0187, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0396, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0099, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0145, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0126, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0457, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0236, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0218, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0210, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0349, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0225, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0328, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0117, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0094, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0222, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0225, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0126, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0210, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0128, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0161, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0238, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0217, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0143, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0225, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0487, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0217, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0258, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0259, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0265, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0230, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0185, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0096, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0252, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0155, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0245, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0160, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0216, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0258, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0261, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0208, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0226, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0161, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0119, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0121, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0130, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0231, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0315, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0137, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0169, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0147, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0123, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0136, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0094, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0098, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0221, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0302, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0320, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0171, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0178, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0254, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0190, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0421, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0228, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0230, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0173, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0280, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0173, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0168, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0206, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0154, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0228, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0155, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0145, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0160, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0225, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0233, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0275, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0184, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0211, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0179, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0296, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0290, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0311, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0235, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0310, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0176, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0161, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0138, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0170, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0209, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0114, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0150, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0136, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0174, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0226, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0350, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0272, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0189, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0294, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0174, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0145, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0291, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0134, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0177, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0134, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0262, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0260, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0243, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0190, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0221, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0156, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0142, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0187, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0110, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0304, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0262, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0145, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0141, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0170, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0157, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0260, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0229, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0127, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0236, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0327, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0139, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0153, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0185, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0128, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0231, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0130, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0157, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0080, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0157, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0343, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0200, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0199, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0141, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0180, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0248, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0169, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0163, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0275, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0211, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0164, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0124, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0298, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0230, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0198, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0141, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0129, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0255, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0222, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0220, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0258, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0156, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0166, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0244, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0201, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0337, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0173, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0289, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0200, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0331, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0266, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0244, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0295, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0196, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0154, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0167, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0246, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0121, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0150, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0161, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0291, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0257, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0245, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0217, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0282, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0118, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0139, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0335, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0315, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0287, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0163, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0220, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0170, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0227, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0153, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0138, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0204, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0137, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0161, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0208, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0245, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0120, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0508, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0148, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0182, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0195, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0225, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0182, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0258, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0285, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0246, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0151, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0113, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0249, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0178, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0154, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0114, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0239, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0136, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0193, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0275, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0175, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0170, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0359, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0203, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0114, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0202, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0291, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0232, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0161, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0156, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0175, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0239, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0148, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0387, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0164, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0117, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0210, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0196, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0200, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0234, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0176, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0219, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0105, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0180, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0366, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0355, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0175, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0393, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0129, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0120, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0207, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0125, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0121, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0127, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0257, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0222, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0148, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0190, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0245, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0215, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0275, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0254, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0191, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0186, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0224, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0163, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0253, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0304, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0241, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0093, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0215, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0425, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0173, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0266, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0178, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0333, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0186, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0240, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0130, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0142, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0156, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0165, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0235, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0240, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0203, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0135, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0134, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0215, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0171, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0291, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0142, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0309, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0301, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0230, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0217, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0184, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0232, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0393, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0190, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0150, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0221, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0305, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0283, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0274, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0130, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0110, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0185, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0359, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0256, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0235, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0215, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0188, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0266, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0171, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0301, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0464, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0229, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0231, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0252, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0234, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0287, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0171, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0245, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0159, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0091, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0103, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0175, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0343, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0225, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0239, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0112, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0185, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0176, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0286, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0228, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0223, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0385, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0231, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0406, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0357, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0245, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0387, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0086, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0116, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0208, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0133, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0295, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0229, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0296, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0187, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0170, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0185, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0287, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0173, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0279, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0176, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0219, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0228, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0145, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0503, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0267, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0193, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0127, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0094, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0122, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0436, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0277, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0279, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0241, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0133, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0144, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0242, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0192, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0317, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0338, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0185, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0187, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0267, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0191, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0194, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0153, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0181, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0206, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0346, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0193, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0132, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0118, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0225, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0371, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0148, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0175, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0328, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0299, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0246, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0244, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0162, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0145, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0139, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0150, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0205, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0184, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0144, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0192, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0205, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0092, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0186, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0167, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0169, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0442, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0170, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0284, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0289, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0152, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0232, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0272, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0248, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0138, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0080, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0138, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0171, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0209, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0219, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0209, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0230, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0468, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0180, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0126, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0287, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0202, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0153, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0216, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0203, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0203, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0202, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0299, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0227, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0242, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0245, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0135, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0126, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0228, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0112, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0195, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0253, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0205, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0249, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0203, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0160, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0204, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0307, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0235, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0224, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0192, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0209, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0318, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0200, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0209, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0180, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0422, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0210, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0181, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0321, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0242, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0139, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0238, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0206, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0230, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0452, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0203, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0293, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0177, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0214, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0196, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0203, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0232, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0135, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0165, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0199, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0253, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0159, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0198, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0303, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0109, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0190, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0267, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0259, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0154, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0311, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0272, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0469, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0109, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0411, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0162, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0268, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0097, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0096, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0188, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0252, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0176, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0232, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0205, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0180, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0116, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0200, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0263, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0362, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0259, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0283, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0373, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0149, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0272, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0327, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0116, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0172, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0345, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0250, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0252, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0268, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0218, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0290, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0254, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0204, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0106, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0229, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0214, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0109, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0177, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0203, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0188, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0277, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0329, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0178, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0267, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0092, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0170, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0205, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0090, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0280, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0241, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0162, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0228, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0207, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0275, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0263, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0249, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0189, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0136, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0205, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0206, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0205, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0160, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0173, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0225, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0163, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0359, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0265, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0179, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0184, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0144, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0215, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0251, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0204, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0170, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0311, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0181, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0231, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0159, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0219, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0254, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0167, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0187, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0184, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0289, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0209, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0096, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0282, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0144, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0191, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0212, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0171, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0183, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0125, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0163, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0240, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0228, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0239, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0247, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0185, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0196, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0208, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0205, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0167, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0304, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0334, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0205, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0149, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0137, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0280, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0135, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0178, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0143, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0237, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0211, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0229, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0148, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0218, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0140, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0207, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0145, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0332, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0094, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0242, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0129, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0274, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0196, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0151, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0243, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0201, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0081, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0209, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0111, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0216, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0126, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0155, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0205, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0317, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0139, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0286, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0211, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0243, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0243, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0177, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0355, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0127, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0147, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0137, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0129, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0113, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0216, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0342, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0295, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0440, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0412, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0112, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0380, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0121, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0409, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0243, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0286, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0205, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0272, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0105, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0195, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0162, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0101, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0150, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0117, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0155, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0149, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0263, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0325, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0247, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0184, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0355, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0175, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0265, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0227, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0403, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0205, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0227, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0211, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0156, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0133, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0116, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0215, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0192, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0132, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0120, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0307, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0098, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0400, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0185, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0172, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0166, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0206, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0201, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0232, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0227, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0215, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0181, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0155, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0232, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0325, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0274, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0308, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0207, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0237, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0249, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0132, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0109, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0163, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0194, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0211, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0208, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0145, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0288, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0181, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0257, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0155, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0148, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0320, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0223, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0160, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0101, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0253, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0161, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0100, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0123, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0200, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0132, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0301, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0177, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0163, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0261, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0179, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0255, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0252, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0275, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0291, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0306, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0213, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0149, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0204, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0107, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0240, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0087, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0218, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0293, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0205, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0141, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0230, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0238, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0261, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0288, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0286, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0139, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0410, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0284, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0175, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0189, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0228, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0137, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0219, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0312, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0147, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0134, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0124, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0091, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0422, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0186, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0271, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0221, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0218, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0467, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0093, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0164, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0231, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0198, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0227, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0372, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0294, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0111, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0142, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0106, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0106, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0228, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0218, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0432, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0182, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0157, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0151, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0131, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0299, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0206, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0220, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0176, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0200, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0133, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0182, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0152, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0215, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0187, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0387, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0225, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0173, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0231, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0242, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0146, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0173, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0109, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0270, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0226, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0122, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0095, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0127, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0120, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0187, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0407, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0174, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0304, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0132, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0151, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0125, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0099, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0342, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0192, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0590, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0152, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0217, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0299, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0150, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0140, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0102, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0223, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0133, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0221, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0223, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0152, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0161, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0263, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0263, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0299, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0258, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0337, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0235, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0142, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0119, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0175, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0145, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0169, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0139, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0284, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0169, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0237, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0338, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0351, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0319, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0234, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0310, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0342, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0167, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0206, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0182, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0165, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0266, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0149, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0250, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0142, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0205, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0174, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0184, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0445, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0163, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0220, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0246, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0204, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0247, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0237, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0211, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0125, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0147, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0196, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0217, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0205, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0300, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0291, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0166, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0238, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0513, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0147, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0158, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0230, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0232, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0113, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0369, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0278, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0218, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0140, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0224, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0495, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0182, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0153, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0229, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0175, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0126, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0333, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0300, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0115, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0169, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0193, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0114, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0312, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0203, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0295, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0231, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0341, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0331, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0436, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0092, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0254, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0219, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0142, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0174, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0114, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0204, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0113, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0163, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0273, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0185, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0344, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0210, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0293, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0499, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0326, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0172, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0196, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0219, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0192, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0287, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0207, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0165, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0216, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0219, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0284, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0189, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0174, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0157, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0288, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0191, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0228, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0218, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0519, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0327, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0250, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0287, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0178, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0256, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0133, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0187, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0165, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0151, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0212, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0232, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0362, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0233, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0172, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0194, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0268, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0174, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0202, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0231, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0294, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0216, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0172, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0163, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0152, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0218, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0162, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0139, device='cuda:0', grad_fn=<NllLossBackward>)
Saving model at iteration: 20000
Loss: tensor(0.0443, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0272, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0209, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0118, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0112, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0212, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0217, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0353, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0198, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0276, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0222, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0202, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0089, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0315, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0187, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0190, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0154, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0162, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0219, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0241, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0281, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0202, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0134, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0232, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0243, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0160, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0188, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0255, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0342, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0151, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0151, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0194, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0205, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0200, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0260, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0160, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0263, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0201, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0449, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0559, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0208, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0243, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0153, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0146, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0310, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0184, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0369, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0167, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0178, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0140, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0151, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0192, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0107, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0231, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0180, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0156, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0205, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0236, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0209, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0294, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0242, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0170, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0397, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0217, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0220, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0215, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0274, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0141, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0183, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0129, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0145, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0204, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0206, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0227, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0229, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0125, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0324, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0287, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0132, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0221, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0212, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0166, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0129, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0255, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0275, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0302, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0137, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0162, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0223, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0200, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0329, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0203, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0130, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0242, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0175, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0129, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0234, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0135, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0133, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0209, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0223, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0163, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0205, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0276, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0198, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0276, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0281, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0212, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0284, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0210, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0241, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0217, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0173, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0196, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0117, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0248, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0349, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0099, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0173, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0204, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0128, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0197, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0137, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0285, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0225, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0295, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0225, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0238, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0286, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0222, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0263, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0247, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0249, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0167, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0135, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0093, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0361, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0204, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0118, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0188, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0184, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0229, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0246, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0306, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0246, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0188, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0390, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0207, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0326, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0186, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0136, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0164, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0170, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0202, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0163, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0192, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0431, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0229, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0214, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0318, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0191, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0307, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0248, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0113, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0161, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0117, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0142, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0081, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0131, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0191, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0279, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0357, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0183, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0181, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0182, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0285, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0248, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0284, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0349, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0359, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0167, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0128, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0116, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0155, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0152, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0263, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0174, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0201, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0229, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0202, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0138, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0268, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0199, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0218, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0238, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0212, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0139, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0153, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0244, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0151, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0333, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0131, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0131, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0154, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0233, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0264, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0196, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0147, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0206, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0229, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0150, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0225, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0408, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0271, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0140, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0203, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0293, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0103, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0229, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0236, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0202, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0279, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0180, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0220, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0252, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0375, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0247, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0266, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0292, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0156, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0179, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0208, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0139, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0308, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0272, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0142, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0243, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0221, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0284, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0244, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0278, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0254, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0219, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0130, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0180, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0192, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0206, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0264, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0120, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0213, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0161, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0116, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0399, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0203, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0197, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0144, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0236, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0230, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0291, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0169, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0081, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0143, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0202, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0176, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0149, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0343, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0125, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0131, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0251, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0113, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0165, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0204, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0259, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0161, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0283, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0278, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0093, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0322, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0176, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0178, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0181, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0176, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0168, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0233, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0177, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0334, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0238, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0317, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0246, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0156, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0181, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0378, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0123, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0288, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0250, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0239, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0164, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0228, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0139, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0152, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0281, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0196, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0271, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0238, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0224, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0194, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0255, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0257, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0251, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0328, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0248, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0175, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0190, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0111, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0246, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0185, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0190, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0260, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0273, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0244, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0135, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0196, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0233, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0185, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0271, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0199, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0246, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0188, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0315, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0118, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0248, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0134, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0399, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0192, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0212, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0223, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0474, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0174, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0182, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0198, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0207, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0198, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0321, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0277, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0261, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0154, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0173, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0197, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0205, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0197, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0194, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0094, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0283, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0116, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0222, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0225, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0494, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0221, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0232, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0234, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0285, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0186, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0123, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0237, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0228, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0329, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0176, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0116, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0200, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0188, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0189, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0113, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0212, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0415, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0537, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0097, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0154, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0106, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0259, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0171, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0381, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0254, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0220, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0138, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0231, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0209, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0248, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0227, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0224, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0207, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0204, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0158, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0270, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0136, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0163, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0303, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0200, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0113, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0157, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0202, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0285, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0255, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0192, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0270, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0277, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0176, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0138, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0233, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0485, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0309, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0246, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0224, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0270, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0166, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0273, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0233, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0194, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0260, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0266, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0203, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0222, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0127, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0346, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0175, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0236, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0177, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0505, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0201, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0194, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0180, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0320, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0401, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0250, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0179, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0229, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0210, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0158, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0125, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0147, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0228, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0351, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0192, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0225, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0294, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0111, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0255, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0173, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0186, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0239, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0247, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0223, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0150, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0160, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0194, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0188, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0332, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0265, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0250, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0239, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0154, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0202, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0115, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0151, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0301, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0300, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0265, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0178, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0145, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0218, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0107, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0204, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0153, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0184, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0265, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0405, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0271, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0301, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0221, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0235, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0137, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0216, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0204, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0194, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0273, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0181, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0127, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0336, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0234, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0303, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0235, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0156, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0149, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0151, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0212, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0209, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0105, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0148, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0166, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0256, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0296, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0257, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0277, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0339, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0159, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0264, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0172, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0352, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0185, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0143, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0132, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0128, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0255, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0225, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0183, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0228, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0243, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0114, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0212, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0126, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0344, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0348, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0325, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0120, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0157, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0167, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0290, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0323, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0352, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0190, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0146, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0285, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0326, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0159, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0211, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0155, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0169, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0258, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0285, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0312, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0148, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0230, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0175, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0242, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0133, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0247, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0246, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0290, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0177, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0230, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0234, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0180, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0187, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0316, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0282, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0155, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0307, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0148, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0149, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0206, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0168, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0173, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0245, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0133, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0171, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0267, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0284, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0267, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0174, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0294, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0298, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0276, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0183, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0149, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0148, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0226, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0233, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0100, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0293, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0168, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0275, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0159, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0197, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0153, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0191, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0250, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0253, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0251, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0144, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0164, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0144, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0203, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0260, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0303, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0162, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0169, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0255, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0143, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0266, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0257, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0321, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0196, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0242, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0211, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0214, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0279, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0285, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0188, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0193, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0138, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0133, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0279, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0143, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0235, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0223, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0135, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0177, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0350, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0165, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0151, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0171, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0216, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0402, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0141, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0135, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0198, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0309, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0243, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0143, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0245, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0221, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0295, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0386, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0241, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0112, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0258, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0130, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0087, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0225, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0143, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0257, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0264, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0186, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0365, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0105, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0230, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0256, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0235, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0272, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0264, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0240, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0124, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0483, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0196, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0106, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0317, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0215, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0136, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0222, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0182, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0276, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0131, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0132, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0581, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0205, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0308, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0291, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0218, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0103, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0188, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0272, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0139, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0105, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0295, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0232, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0291, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0239, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0204, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0209, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0205, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0251, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0434, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0245, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0216, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0194, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0252, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0205, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0203, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0180, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0203, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0225, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0133, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0324, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0139, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0186, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0184, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0131, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0101, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0179, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0186, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0260, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0260, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0194, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0140, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0167, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0099, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0305, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0203, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0311, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0180, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0283, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0201, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0238, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0307, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0246, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0294, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0214, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0567, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0176, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0223, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0164, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0435, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0196, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0130, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0232, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0235, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0164, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0162, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0201, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0244, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0271, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0177, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0215, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0305, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0179, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0167, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0117, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0318, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0178, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0169, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0223, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0203, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0091, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0265, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0189, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0219, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0286, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0207, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0195, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0275, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0264, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0320, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0251, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0245, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0240, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0106, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0292, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0171, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0198, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0330, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0089, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0139, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0164, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0100, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0205, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0164, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0252, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0190, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0154, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0171, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0271, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0244, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0248, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0243, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0315, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0150, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0298, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0181, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0215, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0546, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0185, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0327, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0166, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0426, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0424, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0263, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0209, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0144, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0150, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0097, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0150, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0331, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0193, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0217, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0270, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0350, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0192, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0303, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0137, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0325, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0239, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0222, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0283, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0175, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0168, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0156, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0260, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0191, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0247, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0205, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0127, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0158, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0232, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0385, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0229, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0251, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0346, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0203, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0242, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0163, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0199, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0145, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0162, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0271, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0279, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0160, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0206, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0277, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0189, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0228, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0404, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0388, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0212, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0311, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0355, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0136, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0138, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0140, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0452, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0267, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0153, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0214, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0171, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0249, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0282, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0193, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0127, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0173, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0194, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0208, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0260, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0295, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0231, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0115, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0323, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0205, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0158, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0154, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0188, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0254, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0227, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0174, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0173, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0099, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0427, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0138, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0267, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0205, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0327, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0166, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0350, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0200, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0096, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0156, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0148, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0175, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0163, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0258, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0137, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0261, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0353, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0160, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0140, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0222, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0221, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0218, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0237, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0176, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0282, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0185, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0237, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0233, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0189, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0151, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0169, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0286, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0342, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0348, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0223, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0148, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0330, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0176, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0251, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0162, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0269, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0172, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0358, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0506, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0102, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0188, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0195, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0210, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0108, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0289, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0311, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0119, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0129, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0256, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0297, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0139, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0269, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0136, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0129, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0316, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0122, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0252, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0294, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0136, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0272, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0184, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0234, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0171, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0155, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0216, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0203, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0386, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0152, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0256, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0206, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0154, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0293, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0134, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0255, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0105, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0145, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0174, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0178, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0136, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0142, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0122, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0497, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0205, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0241, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0374, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0233, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0372, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0167, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0199, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0150, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0250, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0166, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0193, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0155, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0248, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0262, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0243, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0194, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0287, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0190, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0367, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0328, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0150, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0133, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0111, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0135, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0223, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0270, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0235, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0099, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0258, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0186, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0259, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0197, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0225, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0324, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0202, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0132, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0267, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0194, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0130, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0148, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0232, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0240, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0173, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0238, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0258, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0191, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0109, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0269, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0333, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0380, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0118, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0158, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0133, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0202, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0195, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0179, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0167, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0131, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0243, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0210, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0157, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0241, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0174, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0229, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0344, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0215, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0274, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0087, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0276, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0240, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0116, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0203, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0283, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0276, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0237, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0185, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0249, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0106, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0187, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0429, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0291, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0203, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0198, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0393, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0214, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0207, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0110, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0242, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0097, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0315, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0225, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0191, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0124, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0302, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0269, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0170, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0269, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0192, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0315, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0203, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0206, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0302, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0273, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0177, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0220, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0129, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0256, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0235, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0391, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0160, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0339, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0142, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0245, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0235, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0316, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0114, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0130, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0450, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0136, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0173, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0237, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0200, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0197, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0317, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0141, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0295, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0230, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0218, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0259, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0217, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0303, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0493, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0191, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0126, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0226, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0298, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0243, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0136, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0116, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0225, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0269, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0194, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0434, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0130, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0364, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0161, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0462, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0226, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0179, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0168, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0217, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0196, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0158, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0210, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0254, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0261, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0391, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0278, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0182, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0139, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0205, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0143, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0111, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0205, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0151, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0205, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0261, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0284, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0174, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0273, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0131, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0208, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0213, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0213, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0144, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0352, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0222, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0252, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0246, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0114, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0241, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0156, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0253, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0153, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0189, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0138, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0233, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0159, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0211, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0234, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0086, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0133, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0334, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0357, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0326, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0361, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0113, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0216, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0163, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0217, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0143, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0187, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0189, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0157, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0218, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0180, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0341, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0205, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0259, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0357, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0319, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0207, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0357, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0190, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0096, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0251, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0231, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0274, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0203, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0204, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0252, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0260, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0307, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0131, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0206, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0262, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0175, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0156, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0357, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0231, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0091, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0382, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0134, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0206, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0207, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0100, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0295, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0223, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0169, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0270, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0177, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0230, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0185, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0263, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0347, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0161, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0193, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0189, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0207, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0233, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0142, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0117, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0184, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0203, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0266, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0143, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0163, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0103, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0238, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0216, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0312, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0089, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0203, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0376, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0195, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0286, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0197, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0113, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0185, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0133, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0131, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0091, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0130, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0269, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0256, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0545, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0174, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0613, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0177, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0264, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0302, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0142, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0138, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0312, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0156, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0138, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0218, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0139, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0241, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0167, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0278, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0181, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0205, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0168, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0184, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0229, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0244, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0202, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0171, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0251, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0146, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0239, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0110, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0136, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0164, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0298, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0141, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0293, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0289, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0239, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0190, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0329, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0164, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0178, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0113, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0116, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0166, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0105, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0298, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0199, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0098, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0139, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0205, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0173, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0313, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0223, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0296, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0492, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0323, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0109, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0139, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0230, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0247, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0231, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0145, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0178, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0175, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0129, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0164, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0189, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0377, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0270, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0225, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0215, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0228, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0163, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0176, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0192, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0123, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0261, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0121, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0266, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0304, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0163, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0116, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0234, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0216, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0156, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0335, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0168, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0302, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0119, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0237, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0193, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0133, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0182, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0204, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0177, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0148, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0138, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0191, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0267, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0292, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0273, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0210, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0240, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0172, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0231, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0300, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0152, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0263, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0192, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0168, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0142, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0194, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0191, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0213, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0191, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0293, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0212, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0245, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0260, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0252, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0366, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0152, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0101, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0215, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0192, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0258, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0138, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0153, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0145, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0183, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0203, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0261, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0182, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0196, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0216, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0188, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0222, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0323, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0348, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0214, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0166, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0201, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0249, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0253, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0174, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0298, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0335, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0124, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0231, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0256, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0136, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0300, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0159, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0352, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0359, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0271, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0209, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0241, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0226, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0209, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0280, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0178, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0124, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0238, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0092, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0186, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0309, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0083, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0218, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0212, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0389, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0227, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0225, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0336, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0271, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0217, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0159, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0259, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0159, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0167, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0392, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0314, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0133, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0145, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0352, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0206, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0171, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0324, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0226, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0323, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0112, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0104, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0238, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0086, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0254, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0130, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0127, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0172, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0148, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0106, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0192, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0159, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0170, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0421, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0149, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0220, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0359, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0136, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0280, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0255, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0180, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0179, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0243, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0197, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0126, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0218, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0268, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0165, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0245, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0191, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0279, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0429, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0108, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0109, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0213, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0343, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0165, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0281, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0255, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0177, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0218, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0422, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0219, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0340, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0146, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0217, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0283, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0205, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0138, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0294, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0149, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0164, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0112, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0145, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0232, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0134, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0216, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0319, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0507, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0234, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0303, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0268, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0209, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0102, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0246, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0125, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0192, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0180, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0137, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0164, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0301, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0170, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0186, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0148, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0127, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0304, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0122, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0280, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0256, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0126, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0183, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0154, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0175, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0137, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0161, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0107, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0249, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0121, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0139, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0210, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0215, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0166, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0214, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0333, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0095, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0226, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0309, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0122, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0190, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0161, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0147, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0214, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0157, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0196, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0128, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0164, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0277, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0328, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0280, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0273, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0392, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0299, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0549, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0142, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0181, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0121, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0192, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0191, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0172, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0168, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0166, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0187, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0298, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0222, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0243, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0342, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0219, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0256, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0308, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0285, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0127, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0330, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0234, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0111, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0171, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0170, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0257, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0385, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0197, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0386, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0197, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0175, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0298, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0271, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0085, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0133, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0127, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0172, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0172, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0184, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0162, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0199, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0240, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0279, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0110, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0218, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0115, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0279, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0194, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0389, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0158, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0166, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0165, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0663, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0131, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0146, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0134, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0118, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0190, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0199, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0196, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0164, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0216, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0188, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0120, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0253, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0200, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0146, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0229, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0233, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0124, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0185, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0246, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0192, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0241, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0142, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0163, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0166, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0238, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0208, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0235, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0217, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0166, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0264, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0182, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0132, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0135, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0410, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0452, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0129, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0157, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0168, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0383, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0117, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0107, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0117, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0139, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0236, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0169, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0326, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0191, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0199, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0259, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0235, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0463, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0139, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0284, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0263, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0172, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0162, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0160, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0224, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0185, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0218, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0238, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0158, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0149, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0217, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0263, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0257, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0186, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0237, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0267, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0111, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0268, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0153, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0178, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0230, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0192, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0189, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0202, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0288, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0194, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0144, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0133, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0194, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0149, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0273, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0196, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0345, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0358, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0116, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0178, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0120, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0251, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0152, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0179, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0206, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0293, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0192, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0103, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0374, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0238, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0214, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0286, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0210, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0296, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0316, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0247, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0151, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0141, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0243, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0123, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0199, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0135, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0272, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0140, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0106, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0103, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0182, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0254, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0306, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0220, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0329, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0245, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0168, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0211, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0169, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0224, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0276, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0201, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0306, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0365, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0166, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0135, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0087, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0275, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0324, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0226, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0081, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0102, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0276, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0169, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0153, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0366, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0156, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0106, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0332, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0193, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0187, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0256, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0404, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0165, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0179, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0230, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0298, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0162, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0237, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0317, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0121, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0195, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0121, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0215, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0191, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0236, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0204, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0206, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0149, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0305, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0294, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0194, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0291, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0203, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0185, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0202, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0143, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0143, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0142, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0218, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0207, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0214, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0104, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0124, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0221, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0168, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0359, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0157, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0257, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0185, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0377, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0170, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0113, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0165, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0198, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0201, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0089, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0151, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0164, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0110, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0023, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0115, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0195, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0235, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0164, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0247, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0200, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0209, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0364, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0246, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0161, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0136, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0140, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0189, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0237, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0242, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0132, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0191, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0168, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0211, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0192, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0234, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0245, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0181, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0316, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0146, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0373, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0102, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0165, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0151, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0149, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0127, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0152, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0094, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0180, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0249, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0132, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0212, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0104, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0254, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0289, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0171, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0115, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0174, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0424, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0192, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0138, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0264, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0136, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0215, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0179, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0167, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0227, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0146, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0264, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0139, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0212, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0256, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0254, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0207, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0082, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0222, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0143, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0253, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0190, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0140, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0191, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0113, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0109, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0226, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0299, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0478, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0258, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0156, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0252, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0326, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0650, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0154, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0094, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0227, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0125, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0239, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0137, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0205, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0176, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0231, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0185, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0185, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0271, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0112, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0169, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0173, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0278, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0123, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0216, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0147, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0170, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0199, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0118, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0127, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0383, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0347, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0140, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0159, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0396, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0251, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0208, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0155, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0167, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0152, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0147, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0181, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0230, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0133, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0251, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0179, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0124, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0339, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0209, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0199, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0415, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0224, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0137, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0303, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0305, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0224, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0355, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0165, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0161, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0120, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0134, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0169, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0118, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0158, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0187, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0174, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0260, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0287, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0175, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0189, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0333, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0270, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0134, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0172, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0158, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0158, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0199, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0161, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0147, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0304, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0383, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0210, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0158, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0199, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0182, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0231, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0140, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0101, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0096, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0199, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0295, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0167, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0112, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0121, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0202, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0187, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0150, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0109, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0098, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0160, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0267, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0185, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0293, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0166, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0088, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0152, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0178, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0353, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0096, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0184, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0221, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0324, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0204, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0105, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0181, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0173, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0152, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0193, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0270, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0124, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0234, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0133, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0173, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0199, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0191, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0201, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0127, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0090, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0174, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0210, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0123, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0223, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0439, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0309, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0237, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0147, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0165, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0257, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0288, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0096, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0338, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0132, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0209, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0346, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0368, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0159, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0420, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0335, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0255, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0301, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0162, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0189, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0396, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0179, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0222, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0456, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0216, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0279, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0308, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0147, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0090, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0113, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0118, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0280, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0230, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0306, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0172, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0120, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0176, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0127, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0193, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0173, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0150, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0132, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0196, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0130, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0187, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0220, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0159, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0109, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0184, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0174, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0363, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0137, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0152, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0179, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0180, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0297, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0301, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0419, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0100, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0237, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0141, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0381, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0252, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0115, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0121, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0161, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0203, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0320, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0156, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0263, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0208, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0382, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0180, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0237, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0258, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0136, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0244, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0377, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0199, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0253, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0207, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0157, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0139, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0153, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0161, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0165, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0183, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0255, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0132, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0085, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0128, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0204, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0149, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0148, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0364, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0171, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0243, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0205, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0134, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0141, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0127, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0121, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0224, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0256, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0174, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0185, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0149, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0167, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0117, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0199, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0400, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0299, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0274, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0208, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0346, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0178, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0205, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0223, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0273, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0255, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0321, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0257, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0291, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0250, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0242, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0180, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0203, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0210, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0318, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0221, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0268, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0164, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0153, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0138, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0166, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0132, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0168, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0104, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0182, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0194, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0349, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0147, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0172, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0198, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0461, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0206, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0315, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0313, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0118, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0167, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0151, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0263, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0265, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0119, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0117, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0118, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0095, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0136, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0274, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0128, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0111, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0200, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0169, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0213, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0259, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0250, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0143, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0120, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0166, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0082, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0179, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0130, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0225, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0274, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0364, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0233, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0244, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0449, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0217, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0178, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0568, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0315, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0280, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0155, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0132, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0141, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0218, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0191, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0143, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0176, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0154, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0101, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0227, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0200, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0351, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0288, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0198, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0205, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0138, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0275, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0111, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0169, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0140, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0146, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0157, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0177, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0279, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0341, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0242, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0277, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0189, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0285, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0245, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0192, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0148, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0080, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0165, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0234, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0243, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0365, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0309, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0213, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0229, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0116, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0340, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0247, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0131, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0182, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0305, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0262, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0114, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0265, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0149, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0118, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0102, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0157, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0089, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0115, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0253, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0157, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0189, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0321, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0211, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0218, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0234, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0085, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0167, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0257, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0188, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0208, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0265, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0138, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0142, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0095, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0144, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0207, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0328, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0230, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0094, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0180, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0289, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0237, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0090, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0183, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0226, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0242, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0188, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0184, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0199, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0266, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0087, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0138, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0259, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0110, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0150, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0131, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0127, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0135, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0325, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0173, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0595, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0125, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0092, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0182, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0142, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0169, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0168, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0124, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0345, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0316, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0108, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0272, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0096, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0142, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0266, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0218, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0110, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0195, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0117, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0211, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0256, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0352, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0190, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0137, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0215, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0212, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0149, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0178, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0166, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0231, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0174, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0367, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0293, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0143, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0108, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0189, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0127, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0125, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0126, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0143, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0118, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0151, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0199, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0141, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0265, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0256, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0370, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0254, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0161, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0159, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0206, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0227, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0149, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0134, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0120, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0131, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0179, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0169, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0196, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0193, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0132, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0281, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0292, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0290, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0236, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0269, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0135, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0115, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0195, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0133, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0190, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0099, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0267, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0217, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0155, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0171, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0155, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0125, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0372, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0378, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0174, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0298, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0150, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0234, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0118, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0204, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0314, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0197, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0119, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0216, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0243, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0214, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0180, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0256, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0200, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0254, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0115, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0192, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0295, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0191, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0269, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0176, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0164, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0223, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0245, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0152, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0195, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0111, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0193, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0251, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0371, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0176, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0173, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0281, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0124, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0184, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0270, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0147, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0221, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0418, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0260, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0158, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0236, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0311, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0343, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0301, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0281, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0126, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0173, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0149, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0131, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0302, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0344, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0294, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0278, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0140, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0218, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0216, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0673, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0236, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0225, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0165, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0195, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0173, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0312, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0236, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0310, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0117, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0176, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0152, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0235, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0101, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0255, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0365, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0157, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0344, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0239, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0210, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0139, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0288, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0304, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0106, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0106, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0292, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0172, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0291, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0255, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0179, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0149, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0243, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0173, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0215, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0456, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0290, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0295, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0130, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0232, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0241, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0228, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0217, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0249, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0248, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0138, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0235, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0196, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0162, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0254, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0384, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0253, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0391, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0416, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0229, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0162, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0176, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0193, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0224, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0205, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0188, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0113, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0168, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0173, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0150, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0275, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0433, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0249, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0179, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0139, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0152, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0119, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0176, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0179, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0231, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0163, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0184, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0186, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0257, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0153, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0189, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0295, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0272, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0184, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0294, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0122, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0162, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0134, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0190, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0280, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0330, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0237, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0221, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0202, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0224, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0296, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0360, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0186, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0115, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0401, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0159, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0162, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0170, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0090, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0225, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0221, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0122, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0192, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0345, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0163, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0150, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0245, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0258, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0199, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0356, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0205, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0271, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0140, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0242, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0259, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0142, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0295, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0179, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0248, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0157, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0113, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0170, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0180, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0135, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0292, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0132, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0200, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0192, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0194, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0165, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0162, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0245, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0131, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0127, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0227, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0217, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0097, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0113, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0383, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0193, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0299, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0354, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0301, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0259, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0199, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0386, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0187, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0227, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0238, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0211, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0305, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0214, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0137, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0109, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0261, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0141, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0150, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0170, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0279, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0371, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0342, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0273, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0158, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0229, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0187, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0286, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0142, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0162, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0196, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0265, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0243, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0113, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0359, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0102, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0295, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0140, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0258, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0113, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0164, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0258, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0220, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0154, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0211, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0156, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0203, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0129, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0189, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0352, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0256, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0250, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0341, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0108, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0177, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0149, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0181, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0108, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0294, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0132, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0110, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0166, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0161, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0646, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0265, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0120, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0137, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0176, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0207, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0234, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0144, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0162, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0132, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0139, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0256, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0179, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0367, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0196, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0168, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0206, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0411, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0146, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0114, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0225, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0122, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0162, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0177, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0189, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0171, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0168, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0197, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0100, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0155, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0330, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0258, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0348, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0280, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0420, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0101, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0181, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0157, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0230, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0201, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0268, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0114, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0283, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0230, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0130, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0117, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0281, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0252, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0168, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0153, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0092, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0135, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0202, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0113, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0092, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0132, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0248, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0197, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0418, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0195, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0308, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0134, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0170, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0133, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0170, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0208, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0132, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0170, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0218, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0107, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0122, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0235, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0170, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0182, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0199, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0250, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0322, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0149, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0225, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0143, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0189, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0317, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0146, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0301, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0081, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0217, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0216, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0260, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0409, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0196, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0210, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0317, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0278, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0173, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0201, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0216, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0314, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0199, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0224, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0143, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0179, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0225, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0317, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0172, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0268, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0408, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0156, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0104, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0209, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0262, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0085, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0182, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0223, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0287, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0240, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0170, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0216, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0262, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0317, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0131, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0236, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0290, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0323, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0142, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0248, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0147, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0160, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0126, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0212, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0171, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0263, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0350, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0257, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0137, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0119, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0235, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0164, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0226, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0174, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0115, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0282, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0210, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0159, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0266, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0155, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0257, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0218, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0300, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0207, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0192, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0173, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0144, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0186, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0237, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0094, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0188, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0212, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0199, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0143, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0327, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0210, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0355, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0199, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0289, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0215, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0186, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0080, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0217, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0253, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0160, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0152, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0131, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0206, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0292, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0158, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0174, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0174, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0246, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0282, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0530, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0122, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0144, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0174, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0194, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0360, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0177, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0135, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0124, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0280, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0194, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0190, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0122, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0292, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0332, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0220, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0184, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0215, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0186, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0281, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0279, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0205, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0207, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0228, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0240, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0217, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0301, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0140, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0090, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0257, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0252, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0396, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0203, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0383, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0309, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0144, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0121, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0154, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0243, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0171, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0186, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0340, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0282, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0205, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0140, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0233, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0161, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0145, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0216, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0112, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0354, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0296, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0100, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0158, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0178, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0271, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0217, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0143, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0247, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0148, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0303, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0305, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0100, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0259, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0154, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0131, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0136, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0132, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0081, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0132, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0517, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0258, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0140, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0167, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0273, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0251, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0230, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0138, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0128, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0133, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0152, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0193, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0168, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0253, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0109, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0218, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0183, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0313, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0158, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0537, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0341, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0154, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0161, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0142, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0204, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0284, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0182, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0125, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0484, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0264, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0150, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0225, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0216, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0216, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0293, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0341, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0190, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0125, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0121, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0237, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0189, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0115, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0136, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0221, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0200, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0286, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0228, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0197, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0218, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0247, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0148, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0210, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0307, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0191, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0278, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0119, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0136, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0141, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0244, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0244, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0224, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0128, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0303, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0343, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0169, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0180, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0169, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0250, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0153, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0249, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0163, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0248, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0092, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0160, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0238, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0093, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0184, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0140, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0099, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0214, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0279, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0247, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0229, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0256, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0168, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0315, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0198, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0137, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0086, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0296, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0106, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0130, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0264, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0213, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0115, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0136, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0281, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0187, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0243, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0197, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0261, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0137, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0182, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0165, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0210, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0108, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0109, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0111, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0146, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0226, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0236, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0214, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0201, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0250, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0307, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0258, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0160, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0206, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0145, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0119, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0172, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0170, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0115, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0256, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0135, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0231, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0147, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0277, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0198, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0343, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0202, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0185, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0145, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0262, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0233, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0119, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0216, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0102, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0195, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0371, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0167, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0394, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0339, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0382, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0372, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0109, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0215, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0305, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0223, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0114, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0220, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0121, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0214, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0195, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0222, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0133, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0227, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0228, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0146, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0218, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0153, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0168, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0232, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0248, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0152, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0312, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0138, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0169, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0083, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0193, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0211, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0349, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0171, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0269, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0175, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0199, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0140, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0184, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0177, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0336, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0234, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0125, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0096, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0175, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0108, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0194, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0309, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0276, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0407, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0123, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0192, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0144, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0096, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0096, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0107, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0278, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0242, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0167, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0240, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0123, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0142, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0252, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0338, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0158, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0153, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0314, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0300, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0171, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0136, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0137, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0254, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0208, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0379, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0245, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0087, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0212, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0231, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0711, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0126, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0271, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0187, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0214, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0119, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0190, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0191, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0129, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0081, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0162, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0150, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0278, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0178, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0150, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0283, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0181, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0272, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0287, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0276, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0196, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0115, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0158, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0298, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0233, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0203, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0105, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0149, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0228, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0154, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0277, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0135, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0205, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0254, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0321, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0145, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0268, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0291, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0290, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0183, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0433, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0119, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0191, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0166, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0579, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0206, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0244, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0103, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0287, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0166, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0175, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0329, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0117, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0217, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0181, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0193, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0224, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0134, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0220, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0139, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0255, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0329, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0162, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0139, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0198, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0195, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0233, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0128, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0170, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0197, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0375, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0114, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0261, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0143, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0233, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0345, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0161, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0285, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0188, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0141, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0336, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0231, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0144, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0237, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0151, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0249, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0247, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0223, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0237, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0153, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0214, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0152, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0286, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0267, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0276, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0155, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0351, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0166, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0107, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0222, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0146, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0208, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0217, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0182, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0167, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0173, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0238, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0189, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0421, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0142, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0264, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0221, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0134, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0178, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0167, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0162, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0193, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0159, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0206, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0295, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0099, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0200, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0141, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0352, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0289, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0133, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0304, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0236, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0177, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0264, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0146, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0400, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0288, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0196, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0192, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0193, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0256, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0206, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0178, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0265, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0142, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0132, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0172, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0262, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0404, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0238, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0194, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0271, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0150, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0153, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0266, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0354, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0160, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0299, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0190, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0209, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0117, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0324, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0234, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0136, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0116, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0126, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0222, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0339, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0271, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0319, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0219, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0155, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0231, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0103, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0298, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0176, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0262, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0141, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0173, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0211, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0123, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0249, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0211, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0229, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0118, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0303, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0185, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0189, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0215, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0111, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0143, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0300, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0291, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0177, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0270, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0245, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0322, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0125, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0198, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0273, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0244, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0224, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0202, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0159, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0415, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0167, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0413, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0248, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0317, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0242, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0258, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0134, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0183, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0215, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0149, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0217, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0265, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0177, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0145, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0104, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0274, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0284, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0177, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0410, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0305, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0276, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0197, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0288, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0215, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0121, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0161, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0256, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0208, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0261, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0399, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0223, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0209, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0158, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0162, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0169, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0204, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0157, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0131, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0097, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0134, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0271, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0204, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0094, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0141, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0156, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0189, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0127, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0103, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0217, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0237, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0186, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0271, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0082, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0252, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0242, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0371, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0155, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0200, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0110, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0120, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0219, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0260, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0307, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0160, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0118, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0206, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0202, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0272, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0152, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0243, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0088, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0158, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0255, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0108, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0202, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0106, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0204, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0179, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0235, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0140, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0190, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0281, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0157, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0085, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0190, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0095, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0113, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0127, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0272, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0274, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0213, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0200, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0249, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0224, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0140, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0194, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0192, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0270, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0161, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0137, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0279, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0265, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0243, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0410, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0241, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0295, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0234, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0288, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0213, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0196, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0186, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0155, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0098, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0143, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0250, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0238, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0232, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0153, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0368, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0331, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0238, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0197, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0276, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0172, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0126, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0141, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0214, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0106, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0161, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0278, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0151, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0417, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0134, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0321, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0191, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0170, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0217, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0363, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0239, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0235, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0172, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0220, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0246, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0311, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0271, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0172, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0389, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0301, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0337, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0315, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0156, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0155, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0283, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0132, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0155, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0218, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0183, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0203, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0250, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0284, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0210, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0149, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0303, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0105, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0094, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0257, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0477, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0262, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0219, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0224, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0266, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0147, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0165, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0168, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0164, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0197, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0318, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0270, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0165, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0124, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0351, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0132, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0136, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0189, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0183, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0147, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0205, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0269, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0121, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0127, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0169, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0130, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0457, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0145, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0147, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0225, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0140, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0217, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0354, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0157, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0151, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0261, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0141, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0146, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0223, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0130, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0188, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0230, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0329, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0296, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0388, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0142, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0203, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0186, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0256, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0214, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0431, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0125, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0137, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0199, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0257, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0188, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0241, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0247, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0293, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0270, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0224, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0111, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0265, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0167, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0236, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0233, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0278, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0141, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0152, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0151, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0141, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0320, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0271, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0150, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0211, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0200, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0284, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0210, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0167, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0237, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0255, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0261, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0241, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0185, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0459, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0184, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0237, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0204, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0265, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0253, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0124, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0202, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0112, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0163, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0209, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0265, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0133, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0252, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0145, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0082, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0256, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0199, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0189, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0290, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0354, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0304, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0181, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0135, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0192, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0087, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0325, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0149, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0270, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0117, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0395, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0330, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0217, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0164, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0131, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0140, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0121, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0211, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0099, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0217, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0211, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0120, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0145, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0280, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0250, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0280, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0191, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0308, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0414, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0316, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0141, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0236, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0156, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0095, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0187, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0268, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0198, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0111, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0312, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0222, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0172, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0253, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0203, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0328, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0270, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0152, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0122, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0291, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0133, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0313, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0404, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0301, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0185, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0168, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0201, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0210, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0169, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0253, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0354, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0243, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0175, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0173, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0317, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0244, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0189, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0364, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0312, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0126, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0165, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0182, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0130, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0127, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0236, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0225, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0300, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0123, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0166, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0363, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0133, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0201, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0144, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0259, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0225, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0148, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0203, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0170, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0168, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0163, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0161, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0249, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0118, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0147, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0133, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0193, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0279, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0143, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0169, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0454, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0184, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0173, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0213, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0258, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0133, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0254, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0186, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0239, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0297, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0258, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0147, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0184, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0181, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0252, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0288, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0342, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0235, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0150, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0194, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0228, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0091, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0121, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0115, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0251, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0344, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0221, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0207, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0107, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0343, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0096, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0201, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0380, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0242, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0212, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0222, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0095, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0147, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0244, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0268, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0122, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0176, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0273, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0175, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0238, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0180, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0308, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0225, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0273, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0165, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0221, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0289, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0162, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0182, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0229, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0250, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0165, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0302, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0227, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0205, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0154, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0189, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0228, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0153, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0299, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0208, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0178, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0230, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0162, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0152, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0193, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0228, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0188, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0093, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0152, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0248, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0187, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0141, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0213, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0123, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0193, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0218, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0207, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0196, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0161, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0269, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0128, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0141, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0175, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0214, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0146, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0341, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0175, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0183, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0206, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0348, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0181, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0218, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0148, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0221, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0157, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0092, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0209, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0164, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0122, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0220, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0413, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0192, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0289, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0194, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0260, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0158, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0134, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0135, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0236, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0149, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0121, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0189, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0085, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0200, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0220, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0140, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0198, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0169, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0360, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0279, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0138, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0176, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0219, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0176, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0249, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0269, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0168, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0172, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0246, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0167, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0192, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0271, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0244, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0223, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0207, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0200, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0304, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0216, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0254, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0234, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0161, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0095, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0089, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0211, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0295, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0290, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0236, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0195, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0282, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0109, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0199, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0140, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0163, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0144, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0180, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0249, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0127, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0200, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0176, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0591, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0187, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0218, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0267, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0222, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0219, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0169, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0165, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0315, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0144, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0175, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0252, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0238, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0186, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0282, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0348, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0242, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0220, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0224, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0242, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0156, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0263, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0350, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0083, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0256, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0253, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0165, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0284, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0262, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0134, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0206, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0087, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0327, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0176, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0201, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0187, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0136, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0116, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0144, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0164, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0115, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0266, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0188, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0189, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0278, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0236, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0212, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0155, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0270, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0248, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0133, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0280, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0214, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0195, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0130, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0318, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0195, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0143, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0098, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0193, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0212, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0122, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0408, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0317, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0271, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0380, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0186, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0194, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0206, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0178, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0113, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0195, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0170, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0171, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0411, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0138, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0197, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0166, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0168, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0303, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0152, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0210, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0259, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0231, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0141, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0182, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0127, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0203, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0325, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0106, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0148, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0171, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0221, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0253, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0314, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0269, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0207, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0159, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0132, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0263, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0189, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0136, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0158, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0347, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0226, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0323, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0432, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0256, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0319, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0431, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0170, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0254, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0188, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0180, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0242, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0294, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0187, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0119, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0126, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0143, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0335, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0162, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0197, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0188, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0182, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0196, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0100, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0273, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0185, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0229, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0116, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0109, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0231, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0173, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0105, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0279, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0150, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0242, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0306, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0221, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0161, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0118, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0289, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0220, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0479, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0093, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0120, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0115, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0192, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0372, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0347, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0146, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0166, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0274, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0229, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0196, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0259, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0127, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0189, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0149, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0262, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0211, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0277, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0201, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0251, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0162, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0125, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0152, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0233, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0140, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0218, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0120, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0389, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0253, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0309, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0280, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0355, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0214, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0114, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0152, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0250, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0222, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0165, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0259, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0303, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0218, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0189, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0264, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0155, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0211, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0105, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0117, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0167, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0289, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0249, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0094, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0102, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0114, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0106, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0194, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0347, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0317, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0260, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0208, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0228, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0113, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0310, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0202, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0297, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0120, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0314, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0287, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0337, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0213, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0197, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0260, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0092, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0344, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0287, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0193, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0223, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0108, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0245, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0154, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0258, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0169, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0176, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0142, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0267, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0219, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0102, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0248, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0197, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0441, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0241, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0168, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0210, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0421, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0112, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0248, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0112, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0198, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0106, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0232, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0180, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0272, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0096, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0190, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0159, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0255, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0219, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0395, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0246, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0232, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0167, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0212, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0181, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0244, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0325, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0302, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0165, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0242, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0170, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0257, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0213, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0180, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0121, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0158, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0136, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0154, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0237, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0230, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0161, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0192, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0126, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0380, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0475, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0295, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0253, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0315, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0298, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0200, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0184, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0268, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0300, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0278, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0155, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0111, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0223, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0154, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0295, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0483, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0212, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0251, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0346, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0200, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0300, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0149, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0337, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0128, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0213, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0269, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0247, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0313, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0115, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0284, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0222, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0226, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0156, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0180, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0350, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0150, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0238, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0275, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0093, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0184, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0315, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0182, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0346, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0169, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0223, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0125, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0173, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0278, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0193, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0259, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0133, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0121, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0242, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0148, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0213, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0148, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0154, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0346, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0255, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0277, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0145, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0285, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0162, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0168, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0121, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0218, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0500, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0244, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0252, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0288, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0328, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0329, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0270, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0137, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0195, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0181, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0369, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0154, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0273, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0230, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0231, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0442, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0189, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0228, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0180, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0209, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0206, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0248, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0274, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0233, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0225, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0154, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0253, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0282, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0292, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0422, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0178, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0340, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0135, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0306, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0269, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0301, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0274, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0130, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0171, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0326, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0343, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0381, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0165, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0297, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0119, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0155, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0203, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0254, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0128, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0247, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0280, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0406, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0328, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0209, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0106, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0119, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0293, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0274, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0177, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0211, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0272, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0200, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0190, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0237, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0158, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0241, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0114, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0169, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0441, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0270, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0387, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0240, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0190, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0199, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0219, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0182, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0182, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0098, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0124, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0155, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0294, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0124, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0375, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0248, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0268, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0267, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0186, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0134, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0232, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0202, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0248, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0110, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0266, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0233, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0158, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0251, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0256, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0238, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0329, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0233, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0234, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0081, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0321, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0354, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0313, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0167, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0313, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0105, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0252, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0150, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0233, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0311, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0332, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0454, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0247, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0220, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0121, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0170, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0223, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0162, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0323, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0203, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0144, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0367, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0128, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0260, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0172, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0290, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0235, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0387, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0412, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0137, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0134, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0316, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0146, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0180, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0105, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0202, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0153, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0245, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0128, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0296, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0216, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0149, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0109, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0126, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0200, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0134, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0271, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0093, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0251, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0250, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0108, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0099, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0192, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0268, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0214, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0225, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0291, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0272, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0086, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0150, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0208, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0183, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0202, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0252, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0301, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0279, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0120, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0154, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0232, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0123, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0211, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0151, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0410, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0191, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0166, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0406, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0125, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0110, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0192, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0121, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0332, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0261, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0110, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0269, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0156, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0427, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0300, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0288, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0190, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0286, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0252, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0148, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0148, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0153, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0120, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0206, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0233, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0301, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0213, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0104, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0282, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0131, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0182, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0087, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0206, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0144, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0208, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0223, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0272, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0275, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0162, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0211, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0169, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0392, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0190, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0084, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0158, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0149, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0276, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0289, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0201, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0243, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0274, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0203, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0236, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0263, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0229, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0126, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0298, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0126, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0212, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0173, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0228, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0202, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0180, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0103, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0172, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0374, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0116, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0158, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0201, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0150, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0416, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0368, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0399, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0287, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0263, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0210, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0206, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0141, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0124, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0276, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0487, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0262, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0332, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0219, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0235, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0260, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0113, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0172, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0175, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0312, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0304, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0130, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0105, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0266, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0542, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0156, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0396, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0168, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0227, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0141, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0171, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0310, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0256, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0232, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0229, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0113, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0170, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0364, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0232, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0201, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0304, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0267, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0237, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0267, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0141, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0330, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0321, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0293, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0195, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0242, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0183, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0171, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0248, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0159, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0310, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0220, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0159, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0284, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0379, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0284, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0196, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0132, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0254, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0187, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0336, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0278, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0265, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0188, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0085, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0222, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0177, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0235, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0135, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0239, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0225, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0325, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0371, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0217, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0180, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0095, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0299, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0131, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0229, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0442, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0474, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0158, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0264, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0318, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0170, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0156, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0331, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0112, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0239, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0111, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0282, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0169, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0303, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0206, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0330, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0247, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0318, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0276, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0227, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0217, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0154, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0191, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0212, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0252, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0205, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0129, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0119, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0259, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0201, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0259, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0155, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0222, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0289, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0194, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0234, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0356, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0103, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0475, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0227, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0318, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0331, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0194, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0112, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0336, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0314, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0182, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0136, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0158, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0217, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0174, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0279, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0315, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0196, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0216, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0320, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0186, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0135, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0172, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0142, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0233, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0147, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0127, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0224, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0281, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0347, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0254, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0347, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0183, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0199, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0188, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0247, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0273, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0301, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0127, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0190, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0279, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0331, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0211, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0252, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0082, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0174, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0471, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0094, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0134, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0242, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0265, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0214, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0134, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0235, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0116, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0275, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0261, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0241, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0200, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0142, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0366, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0139, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0146, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0172, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0188, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0213, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0211, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0134, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0201, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0146, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0234, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0198, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0258, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0118, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0227, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0171, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0225, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0393, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0207, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0156, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0271, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0536, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0254, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0144, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0147, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0222, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0121, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0372, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0320, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0201, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0191, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0350, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0295, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0206, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0230, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0208, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0256, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0113, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0189, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0285, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0267, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0322, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0289, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0219, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0307, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0177, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0174, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0244, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0297, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0164, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0403, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0131, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0166, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0292, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0216, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0182, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0126, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0264, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0261, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0181, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0213, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0375, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0152, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0202, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0284, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0255, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0158, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0198, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0244, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0246, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0173, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0158, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0250, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0220, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0142, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0107, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0551, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0162, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0266, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0440, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0130, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0206, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0297, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0097, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0380, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0142, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0111, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0232, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0341, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0171, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0385, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0325, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0236, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0276, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0238, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0197, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0159, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0264, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0174, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0151, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0173, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0483, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0257, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0167, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0153, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0161, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0211, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0351, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0225, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0165, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0237, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0232, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0361, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0215, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0295, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0159, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0165, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0126, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0256, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0298, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0150, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0275, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0095, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0137, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0323, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0292, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0168, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0295, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0324, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0473, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0253, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0236, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0329, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0251, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0275, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0284, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0255, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0387, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0237, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0206, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0191, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0453, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0145, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0148, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0219, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0113, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0102, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0109, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0124, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0337, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0259, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0244, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0190, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0132, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0199, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0122, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0536, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0156, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0279, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0164, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0307, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0102, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0243, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0223, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0130, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0236, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0339, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0280, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0218, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0175, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0221, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0327, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0201, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0209, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0118, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0150, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0119, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0343, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0240, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0430, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0288, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0291, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0148, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0149, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0139, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0083, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0149, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0086, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0258, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0199, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0090, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0145, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0256, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0164, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0280, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0090, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0128, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0163, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0461, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0256, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0198, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0108, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0227, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0154, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0179, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0180, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0246, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0248, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0313, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0143, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0318, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0082, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0193, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0228, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0333, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0134, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0184, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0126, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0213, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0225, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0247, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0276, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0247, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0247, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0142, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0228, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0098, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0092, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0383, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0227, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0106, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0298, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0267, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0191, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0189, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0223, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0144, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0310, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0272, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0276, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0226, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0133, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0090, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0359, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0147, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0193, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0173, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0177, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0162, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0185, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0269, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0093, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0233, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0112, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0195, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0284, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0356, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0147, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0383, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0149, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0178, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0121, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0226, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0114, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0179, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0268, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0314, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0328, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0231, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0350, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0210, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0311, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0146, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0250, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0104, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0133, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0110, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0215, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0207, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0215, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0121, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0230, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0136, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0168, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0202, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0237, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0265, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0104, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0113, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0206, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0227, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0097, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0148, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0216, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0354, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0284, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0361, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0311, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0164, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0111, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0230, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0125, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0164, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0243, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0157, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0329, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0178, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0295, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0238, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0214, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0177, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0248, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0165, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0121, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0214, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0268, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0208, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0233, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0268, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0259, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0138, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0133, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0335, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0332, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0168, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0259, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0315, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0465, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0192, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0115, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0236, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0117, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0214, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0388, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0107, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0377, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0192, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0186, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0155, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0372, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0274, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0093, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0232, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0342, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0269, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0175, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0237, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0298, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0219, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0153, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0200, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0207, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0185, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0108, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0304, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0134, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0180, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0232, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0263, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0104, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0688, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0143, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0192, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0209, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0108, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0203, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0335, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0099, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0277, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0196, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0208, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0234, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0227, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0143, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0135, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0260, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0122, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0416, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0146, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0130, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0237, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0180, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0198, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0312, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0375, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0201, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0137, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0124, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0352, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0102, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0122, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0220, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0334, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0237, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0140, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0282, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0243, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0082, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0084, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0240, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0379, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0179, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0204, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0172, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0218, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0129, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0254, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0169, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0080, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0123, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0162, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0102, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0257, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0121, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0119, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0180, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0103, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0178, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0140, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0129, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0107, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0208, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0144, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0214, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0097, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0131, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0223, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0132, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0270, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0459, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0316, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0248, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0236, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0154, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0416, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0351, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0177, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0105, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0155, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0132, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0123, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0177, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0225, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0235, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0217, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0159, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0179, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0286, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0101, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0147, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0332, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0250, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0205, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0090, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0249, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0207, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0155, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0136, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0358, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0158, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0145, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0233, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0278, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0116, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0238, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0212, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0106, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0133, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0151, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0107, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0359, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0167, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0094, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0247, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0236, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0113, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0136, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0203, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0164, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0398, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0208, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0336, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0237, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0182, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0373, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0139, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0113, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0227, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0286, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0103, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0261, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0274, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0347, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0102, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0241, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0217, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0423, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0144, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0113, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0223, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0127, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0234, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0261, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0104, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0181, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0111, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0137, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0106, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0310, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0121, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0118, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0143, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0184, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0324, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0223, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0131, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0233, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0200, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0379, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0093, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0223, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0100, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0210, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0151, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0170, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0121, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0161, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0233, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0136, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0232, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0202, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0300, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0163, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0395, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0480, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0195, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0154, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0200, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0246, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0207, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0211, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0291, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0252, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0104, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0226, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0095, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0128, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0225, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0433, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0087, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0140, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0281, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0263, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0116, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0177, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0160, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0094, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0286, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0173, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0122, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0302, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0211, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0120, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0279, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0151, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0128, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0220, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0265, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0222, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0693, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0160, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0252, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0126, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0150, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0179, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0327, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0237, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0551, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0099, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0264, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0145, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0256, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0354, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0248, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0113, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0106, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0217, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0162, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0235, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0241, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0277, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0127, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0219, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0285, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0261, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0189, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0149, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0247, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0566, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0145, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0243, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0336, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0133, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0272, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0174, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0202, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0171, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0207, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0100, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0161, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0295, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0174, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0131, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0379, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0222, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0288, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0277, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0159, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0184, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0231, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0216, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0103, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0180, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0268, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0142, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0103, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0169, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0100, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0097, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0222, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0249, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0097, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0116, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0153, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0139, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0125, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0129, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0296, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0455, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0231, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0127, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0169, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0139, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0139, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0115, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0309, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0110, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0120, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0115, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0156, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0115, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0271, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0128, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0234, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0162, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0209, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0194, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0163, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0083, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0269, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0244, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0251, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0237, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0288, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0242, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0211, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0144, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0126, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0133, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0155, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0217, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0106, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0094, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0240, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0220, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0146, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0178, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0136, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0291, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0206, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0184, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0165, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0143, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0222, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0099, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0238, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0242, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0290, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0218, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0305, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0239, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0529, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0114, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0141, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0230, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0109, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0247, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0222, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0156, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0093, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0400, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0222, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0093, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0259, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0121, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0114, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0166, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0102, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0159, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0196, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0138, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0156, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0087, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0234, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0280, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0088, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0161, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0262, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0086, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0097, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0140, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0349, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0106, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0162, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0260, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0129, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0200, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0142, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0181, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0141, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0270, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0101, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0269, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0241, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0117, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0305, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0231, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0207, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0196, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0226, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0320, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0238, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0147, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0205, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0194, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0142, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0696, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0149, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0131, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0206, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0322, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0129, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0171, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0229, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0189, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0654, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0343, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0178, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0250, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0177, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0125, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0156, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0423, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0147, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0080, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0154, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0155, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0208, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0342, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0138, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0162, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0087, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0131, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0141, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0133, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0150, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0092, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0267, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0103, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0109, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0264, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0378, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0092, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0103, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0296, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0244, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0105, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0138, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0238, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0188, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0156, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0102, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0103, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0090, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0217, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0253, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0456, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0212, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0285, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0179, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0136, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0245, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0151, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0221, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0219, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0261, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0221, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0155, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0113, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0211, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0110, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0108, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0292, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0098, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0297, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0124, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0110, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0289, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0097, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0406, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0269, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0243, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0153, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0228, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0090, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0168, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0139, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0348, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0365, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0127, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0252, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0296, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0249, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0132, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0163, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0105, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0222, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0156, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0223, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0134, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0128, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0117, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0177, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0260, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0264, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0260, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0117, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0433, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0331, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0267, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0218, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0149, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0346, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0283, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0248, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0142, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0199, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0127, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0259, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0310, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0112, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0146, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0226, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0247, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0144, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0317, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0252, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0093, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0278, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0120, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0279, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0200, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0226, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0252, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0108, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0161, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0188, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0129, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0155, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0156, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0194, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0353, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0240, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0176, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0262, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0098, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0134, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0306, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0198, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0114, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0132, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0194, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0144, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0432, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0085, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0335, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0138, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0081, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0192, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0208, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0103, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0256, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0266, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0153, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0291, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0253, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0251, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0153, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0162, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0157, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0127, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0120, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0161, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0199, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0112, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0132, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0247, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0116, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0237, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0230, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0177, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0189, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0090, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0266, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0141, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0301, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0111, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0234, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0148, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0651, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0152, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0175, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0179, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0136, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0137, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0147, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0334, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0168, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0567, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0141, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0173, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0104, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0239, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0111, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0211, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0361, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0245, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0145, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0100, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0112, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0252, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0253, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0222, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0103, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0288, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0216, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0497, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0204, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0201, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0231, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0138, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0230, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0137, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0147, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0395, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0266, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0100, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0321, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0118, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0158, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0198, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0540, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0159, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0151, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0258, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0302, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0204, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0214, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0354, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0257, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0176, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0100, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0099, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0392, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0277, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0212, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0103, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0092, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0253, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0096, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0253, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0272, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0168, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0192, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0103, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0200, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0092, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0209, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0093, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0183, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0163, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0364, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0245, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0221, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0334, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0246, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0096, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0218, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0219, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0224, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0250, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0213, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0131, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0263, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0186, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0258, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0132, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0178, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0104, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0160, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0199, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0228, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0296, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0239, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0141, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0311, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0119, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0237, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0138, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0304, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0251, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0102, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0195, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0170, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0203, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0336, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0087, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0131, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0126, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0088, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0246, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0335, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0169, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0122, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0092, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0103, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0197, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0105, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0108, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0222, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0162, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0450, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0227, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0147, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0082, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0093, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0148, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0090, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0211, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0268, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0134, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0149, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0290, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0114, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0085, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0194, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0365, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0098, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0303, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0183, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0107, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0269, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0123, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0205, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0104, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0103, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0432, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0104, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0237, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0201, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0151, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0125, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0350, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0213, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0193, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0116, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0178, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0265, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0129, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0245, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0264, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0100, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0204, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0121, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0175, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0124, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0089, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0136, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0292, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0119, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0120, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0354, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0183, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0180, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0119, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0234, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0214, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0416, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0401, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0106, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0118, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0098, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0195, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0397, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0298, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0285, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0186, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0117, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0194, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0285, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0360, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0251, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0115, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0139, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0121, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0265, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0179, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0196, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0090, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0306, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0270, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0378, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0170, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0122, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0258, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0149, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0239, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0197, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0163, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0284, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0134, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0244, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0099, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0143, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0303, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0085, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0316, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0121, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0297, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0102, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0235, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0191, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0443, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0241, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0185, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0229, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0100, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0106, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0311, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0100, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0132, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0182, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0393, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0108, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0302, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0265, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0406, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0228, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0085, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0265, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0274, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0169, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0086, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0199, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0119, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0103, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0442, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0081, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0362, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0233, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0321, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0289, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0362, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0325, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0099, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0175, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0130, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0152, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0363, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0159, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0267, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0352, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0128, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0142, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0099, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0296, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0100, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0085, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0247, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0123, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0235, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0081, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0267, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0092, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0127, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0330, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0268, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0124, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0229, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0375, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0638, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0163, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0114, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0155, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0385, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0226, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0134, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0154, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0225, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0145, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0368, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0113, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0157, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0144, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0127, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0217, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0353, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0176, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0248, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0239, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0132, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0229, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0166, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0256, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0353, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0314, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0193, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0092, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0154, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0120, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0259, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0130, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0210, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0259, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0261, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0085, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0152, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0122, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0159, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0345, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0244, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0154, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0139, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0173, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0267, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0597, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0134, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0162, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0152, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0123, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0201, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0090, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0285, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0132, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0151, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0139, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0328, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0097, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0559, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0256, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0239, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0133, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0266, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0163, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0121, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0403, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0198, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0133, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0244, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0272, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0323, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0186, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0095, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0111, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0200, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0337, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0128, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0146, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0234, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0128, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0361, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0266, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0352, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0113, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0170, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0284, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0120, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0091, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0297, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0162, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0270, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0111, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0220, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0210, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0220, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0091, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0238, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0166, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0294, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0206, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0143, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0324, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0149, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0417, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0233, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0114, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0233, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0084, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0295, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0119, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0191, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0201, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0278, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0141, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0407, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0346, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0201, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0515, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0334, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0260, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0129, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0149, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0181, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0122, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0364, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0145, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0282, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0108, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0225, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0324, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0361, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0244, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0246, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0268, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0285, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0229, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0190, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0301, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0346, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0295, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0224, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0239, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0331, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0141, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0221, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0090, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0193, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0108, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0165, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0118, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0303, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0172, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0251, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0222, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0286, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0126, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0323, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0144, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0156, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0358, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0264, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0147, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0257, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0124, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0251, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0129, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0196, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0228, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0159, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0181, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0265, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0299, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0154, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0317, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0310, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0259, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0345, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0160, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0139, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0110, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0171, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0189, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0305, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0158, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0290, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0104, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0139, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0208, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0247, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0125, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0113, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0202, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0289, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0350, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0283, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0128, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0109, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0306, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0250, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0263, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0204, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0278, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0370, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0244, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0337, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0099, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0177, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0254, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0093, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0266, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0211, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0251, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0149, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0251, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0259, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0143, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0127, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0152, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0288, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0329, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0208, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0136, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0194, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0252, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0117, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0275, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0085, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0173, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0405, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0164, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0103, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0273, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0165, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0250, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0130, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0287, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0628, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0159, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0367, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0113, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0324, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0314, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0210, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0271, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0262, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0368, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0124, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0405, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0264, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0136, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0251, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0268, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0339, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0148, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0359, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0249, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0459, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0180, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0124, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0201, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0194, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0111, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0301, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0116, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0110, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0248, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0292, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0268, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0104, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0186, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0190, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0126, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0242, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0329, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0104, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0140, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0123, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0181, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0320, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0292, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0629, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0246, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0127, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0175, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0187, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0259, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0181, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0107, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0302, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0134, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0245, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0089, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0198, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0244, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0121, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0212, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0285, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0156, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0119, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0108, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0143, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0136, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0167, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0144, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0120, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0166, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0115, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0256, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0296, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0137, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0185, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0147, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0249, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0126, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0178, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0136, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0179, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0280, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0096, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0267, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0133, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0229, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0155, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0237, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0251, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0338, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0151, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0394, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0167, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0228, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0640, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0244, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0142, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0209, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0224, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0297, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0194, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0229, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0329, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0174, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0375, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0372, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0333, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0169, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0115, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0412, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0265, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0360, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0114, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0181, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0409, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0287, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0296, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0164, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0268, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0116, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0100, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0158, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0371, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0107, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0206, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0280, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0334, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0118, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0149, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0290, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0253, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0227, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0262, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0326, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0118, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0362, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0517, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0527, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0200, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0361, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0215, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0427, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0303, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0113, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0183, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0163, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0217, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0183, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0152, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0178, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0185, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0293, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0281, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0289, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0406, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0306, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0102, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0159, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0213, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0138, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0199, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0162, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0174, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0379, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0444, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0315, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0183, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0159, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0418, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0139, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0279, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0223, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0483, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0329, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0268, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0116, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0107, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0170, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0195, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0332, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0238, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0259, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0167, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0261, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0301, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0229, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0248, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0178, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0294, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0331, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0306, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0489, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0242, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0089, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0282, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0135, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0380, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0107, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0180, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0232, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0379, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0281, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0121, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0293, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0298, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0329, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0135, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0254, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0274, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0319, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0607, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0341, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0277, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0380, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0277, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0306, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0146, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0201, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0310, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0289, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0085, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0349, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0140, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0158, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0493, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0407, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0626, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0246, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0299, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0305, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0310, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0209, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0209, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0267, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0116, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0206, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0293, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0300, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0149, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0350, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0326, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0089, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0161, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0491, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0244, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0226, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0328, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0338, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0380, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0325, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0329, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0240, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0132, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0333, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0366, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0322, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0222, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0258, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0125, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0164, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0311, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0159, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0271, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0297, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0123, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0437, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0117, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0209, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0396, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0137, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0513, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0119, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0128, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0296, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0358, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0465, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0680, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0340, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0413, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0133, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0320, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0439, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0118, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0331, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0246, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0148, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0259, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0159, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0291, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0225, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0139, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0527, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0104, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0161, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0439, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0158, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0101, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0193, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0124, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0329, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0383, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0116, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0235, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0204, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0397, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0306, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0107, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0156, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0350, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0143, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0329, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0139, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0276, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0249, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0197, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0315, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0197, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0340, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0154, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0250, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0256, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0523, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0489, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0490, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0094, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0360, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0121, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0118, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0426, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0103, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0319, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0336, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0280, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0150, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0294, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0128, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0393, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0610, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0285, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0417, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0149, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0314, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0294, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0311, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0095, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0277, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0210, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0144, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0359, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0102, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0575, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0166, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0345, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0098, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0100, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0166, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0223, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0358, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0120, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0188, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0381, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0120, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0319, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0172, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0438, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0363, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0445, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0095, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0248, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0355, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0279, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0195, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0292, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0372, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0328, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0222, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0556, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0185, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0100, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0349, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0259, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0099, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0307, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0170, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0192, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0531, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0397, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0242, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0128, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0405, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0189, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0187, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0486, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0354, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0177, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0352, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0173, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0086, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0416, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0118, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0326, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0457, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0224, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0367, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0193, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0140, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0543, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0160, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0327, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0441, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0356, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0190, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0137, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0550, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0212, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0200, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0701, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0218, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0122, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0121, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0339, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0208, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0212, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0171, device='cuda:0', grad_fn=<NllLossBackward>)
--------------EPOCH 2-------------
Test Accuracy: tensor(0.9100, device='cuda:0')
Loss: tensor(0.0206, device='cuda:0', grad_fn=<NllLossBackward>)
Saving model at iteration: 0
Loss: tensor(0.0198, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0167, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0224, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0211, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0333, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0170, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0331, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0158, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0297, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0179, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0251, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0155, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0193, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0199, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0214, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0250, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0203, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0194, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0243, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0283, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0251, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0343, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0210, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0264, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0211, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0229, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0310, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0306, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0235, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0166, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0223, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0222, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0188, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0203, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0321, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0225, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0262, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0230, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0245, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0279, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0196, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0297, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0180, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0314, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0236, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0130, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0175, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0148, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0302, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0180, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0231, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0266, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0213, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0274, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0155, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0169, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0171, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0187, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0243, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0207, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0182, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0206, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0310, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0183, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0256, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0219, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0247, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0142, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0187, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0269, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0152, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0223, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0181, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0175, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0250, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0173, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0122, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0161, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0131, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0135, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0182, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0233, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0126, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0187, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0140, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0121, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0231, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0175, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0328, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0222, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0189, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0141, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0240, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0251, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0108, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0248, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0208, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0151, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0226, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0100, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0149, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0157, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0122, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0229, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0162, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0183, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0206, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0268, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0208, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0232, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0179, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0115, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0158, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0085, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0126, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0085, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0154, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0250, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0147, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0165, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0197, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0156, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0238, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0149, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0149, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0196, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0164, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0185, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0169, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0152, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0159, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0093, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0117, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0138, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0159, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0134, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0172, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0246, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0185, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0274, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0232, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0178, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0155, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0157, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0120, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0269, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0136, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0203, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0127, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0156, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0195, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0124, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0181, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0236, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0093, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0189, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0116, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0213, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0240, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0304, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0090, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0115, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0122, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0122, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0109, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0143, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0240, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0135, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0158, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0258, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0183, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0142, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0197, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0146, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0101, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0087, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0166, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0136, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0158, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0103, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0137, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0302, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0132, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0103, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0100, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0214, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0100, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0226, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0219, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0298, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0133, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0151, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0151, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0193, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0266, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0253, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0239, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0101, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0160, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0145, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0257, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0172, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0158, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0133, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0080, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0148, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0130, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0315, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0186, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0189, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0189, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0141, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0199, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0262, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0337, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0148, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0187, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0197, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0124, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0182, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0331, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0185, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0263, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0173, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0238, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0256, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0125, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0176, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0273, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0197, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0249, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0208, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0192, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0206, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0207, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0284, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0160, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0212, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0170, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0155, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0211, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0146, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0254, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0185, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0118, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0140, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0266, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0176, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0204, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0225, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0275, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0129, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0169, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0315, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0168, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0162, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0162, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0208, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0150, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0241, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0269, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0188, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0175, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0174, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0118, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0134, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0241, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0145, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0198, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0246, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0244, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0226, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0187, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0183, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0115, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0304, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0171, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0094, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0238, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0137, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0199, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0246, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0107, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0181, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0125, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0188, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0232, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0199, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0167, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0273, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0131, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0221, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0239, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0127, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0168, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0144, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0183, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0220, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0185, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0241, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0232, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0160, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0169, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0182, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0120, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0306, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0119, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0253, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0092, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0210, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0281, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0178, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0258, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0335, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0212, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0213, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0138, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0236, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0220, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0233, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0206, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0130, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0206, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0151, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0187, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0136, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0154, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0206, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0159, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0097, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0190, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0154, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0302, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0121, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0188, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0208, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0194, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0148, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0230, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0116, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0269, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0197, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0140, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0155, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0173, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0183, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0187, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0200, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0137, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0119, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0154, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0185, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0190, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0132, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0234, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0165, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0194, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0153, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0267, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0135, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0237, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0135, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0187, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0148, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0293, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0123, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0129, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0100, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0146, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0154, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0180, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0146, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0212, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0176, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0137, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0178, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0179, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0151, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0291, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0270, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0165, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0246, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0139, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0195, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0128, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0171, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0284, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0176, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0156, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0239, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0158, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0231, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0207, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0156, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0279, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0182, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0280, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0227, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0199, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0189, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0203, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0215, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0293, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0321, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0243, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0219, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0129, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0229, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0189, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0153, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0114, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0178, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0144, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0203, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0112, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0167, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0202, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0320, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0217, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0157, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0131, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0194, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0253, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0106, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0297, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0196, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0249, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0316, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0135, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0291, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0188, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0182, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0193, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0117, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0199, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0145, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0173, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0180, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0136, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0197, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0262, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0214, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0119, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0144, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0087, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0191, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0135, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0165, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0108, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0156, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0204, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0217, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0181, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0193, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0214, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0104, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0258, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0092, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0115, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0163, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0244, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0178, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0169, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0284, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0103, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0113, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0177, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0194, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0187, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0174, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0190, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0127, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0139, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0192, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0159, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0145, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0232, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0134, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0210, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0164, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0133, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0185, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0136, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0153, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0144, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0098, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0228, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0249, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0190, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0227, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0236, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0217, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0185, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0169, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0239, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0194, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0183, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0182, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0136, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0216, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0235, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0234, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0186, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0141, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0255, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0111, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0156, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0124, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0174, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0302, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0369, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0150, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0207, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0157, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0200, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0156, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0232, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0227, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0091, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0175, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0153, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0213, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0109, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0139, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0214, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0176, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0201, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0155, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0267, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0227, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0125, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0278, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0107, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0147, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0123, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0162, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0089, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0181, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0211, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0212, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0104, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0219, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0254, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0087, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0123, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0119, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0166, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0083, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0148, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0260, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0264, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0211, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0215, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0290, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0186, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0191, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0240, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0113, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0097, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0191, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0159, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0282, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0163, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0156, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0129, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0144, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0177, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0212, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0207, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0236, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0188, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0086, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0173, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0100, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0215, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0211, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0213, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0132, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0209, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0213, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0217, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0094, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0218, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0156, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0200, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0182, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0084, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0197, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0100, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0249, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0197, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0112, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0207, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0184, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0182, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0176, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0162, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0172, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0164, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0232, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0242, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0231, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0115, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0102, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0232, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0201, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0235, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0249, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0142, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0207, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0179, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0115, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0145, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0252, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0266, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0268, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0114, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0111, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0123, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0102, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0223, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0227, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0153, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0207, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0097, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0213, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0114, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0425, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0163, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0162, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0115, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0156, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0114, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0152, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0148, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0140, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0220, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0152, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0104, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0106, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0108, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0155, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0177, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0213, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0094, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0171, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0097, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0167, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0159, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0145, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0200, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0207, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0151, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0118, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0183, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0143, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0179, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0189, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0132, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0121, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0162, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0243, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0235, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0186, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0148, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0141, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0218, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0151, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0116, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0208, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0124, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0223, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0248, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0174, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0124, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0188, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0105, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0234, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0238, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0152, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0125, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0169, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0242, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0226, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0185, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0254, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0160, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0154, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0157, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0189, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0166, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0323, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0105, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0308, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0243, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0165, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0154, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0131, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0174, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0102, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0171, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0209, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0193, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0154, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0280, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0252, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0114, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0208, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0118, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0186, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0126, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0188, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0146, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0245, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0139, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0188, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0207, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0227, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0179, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0166, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0287, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0136, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0225, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0179, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0210, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0105, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0111, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0144, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0158, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0144, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0307, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0180, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0149, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0193, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0085, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0150, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0136, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0139, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0144, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0170, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0162, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0166, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0259, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0183, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0339, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0114, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0160, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0276, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0150, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0189, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0102, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0268, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0148, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0170, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0101, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0094, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0160, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0129, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0162, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0221, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0149, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0084, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0258, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0124, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0135, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0208, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0174, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0114, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0121, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0292, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0086, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0262, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0111, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0097, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0162, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0112, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0349, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0203, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0129, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0154, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0114, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0183, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0187, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0149, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0092, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0210, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0137, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0192, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0185, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0182, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0256, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0209, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0204, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0240, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0235, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0214, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0126, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0375, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0156, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0178, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0107, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0136, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0129, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0113, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0189, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0268, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0123, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0174, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0193, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0148, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0108, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0242, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0146, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0191, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0200, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0223, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0174, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0104, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0149, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0253, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0176, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0101, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0108, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0234, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0219, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0107, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0168, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0214, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0170, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0222, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0260, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0163, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0190, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0181, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0160, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0222, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0208, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0092, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0180, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0156, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0227, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0128, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0090, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0261, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0157, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0093, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0233, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0167, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0153, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0194, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0215, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0192, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0175, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0211, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0198, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0114, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0181, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0238, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0252, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0155, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0218, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0145, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0231, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0111, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0211, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0138, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0148, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0226, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0085, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0161, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0155, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0130, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0109, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0105, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0178, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0165, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0082, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0233, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0173, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0250, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0211, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0210, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0279, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0218, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0123, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0225, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0162, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0110, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0096, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0325, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0262, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0192, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0206, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0231, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0133, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0101, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0150, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0115, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0208, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0133, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0244, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0215, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0166, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0301, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0168, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0105, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0238, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0203, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0162, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0162, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0167, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0297, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0303, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0229, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0261, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0390, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0260, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0289, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0256, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0088, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0156, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0278, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0134, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0119, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0235, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0109, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0206, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0174, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0083, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0163, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0168, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0203, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0138, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0153, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0288, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0155, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0244, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0189, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0176, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0165, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0244, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0216, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0262, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0213, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0201, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0209, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0231, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0251, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0168, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0170, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0268, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0190, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0184, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0229, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0168, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0296, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0122, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0121, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0241, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0201, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0141, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0185, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0127, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0171, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0130, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0165, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0199, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0188, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0231, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0138, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0216, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0190, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0172, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0137, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0210, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0119, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0241, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0136, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0215, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0158, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0139, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0239, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0114, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0135, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0160, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0294, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0219, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0249, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0190, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0186, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0190, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0189, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0172, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0152, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0158, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0213, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0211, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0144, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0151, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0166, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0320, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0169, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0194, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0188, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0260, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0137, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0113, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0216, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0284, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0179, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0140, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0149, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0100, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0125, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0174, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0175, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0089, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0320, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0243, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0149, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0139, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0131, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0283, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0239, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0222, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0311, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0238, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0095, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0178, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0206, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0208, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0177, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0192, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0181, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0139, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0168, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0157, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0207, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0142, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0251, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0177, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0205, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0109, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0251, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0105, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0085, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0139, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0140, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0249, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0106, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0112, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0140, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0227, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0206, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0138, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0261, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0165, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0151, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0240, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0246, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0152, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0233, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0128, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0337, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0298, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0223, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0159, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0162, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0194, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0165, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0205, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0245, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0221, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0081, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0178, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0080, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0182, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0288, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0087, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0280, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0209, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0162, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0128, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0136, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0181, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0137, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0158, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0303, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0131, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0235, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0081, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0143, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0150, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0118, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0148, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0170, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0231, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0095, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0179, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0116, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0101, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0200, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0167, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0099, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0212, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0162, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0194, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0119, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0139, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0252, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0189, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0281, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0142, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0167, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0114, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0153, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0140, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0126, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0107, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0253, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0122, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0165, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0145, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0085, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0182, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0087, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0258, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0110, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0125, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0138, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0169, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0139, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0218, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0212, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0229, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0161, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0199, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0245, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0206, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0202, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0208, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0228, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0103, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0112, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0239, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0164, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0113, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0233, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0254, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0091, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0285, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0217, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0180, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0207, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0148, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0261, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0101, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0241, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0224, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0165, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0283, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0232, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0154, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0159, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0133, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0156, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0093, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0194, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0196, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0283, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0258, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0161, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0127, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0509, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0171, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0122, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0218, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0113, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0116, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0095, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0167, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0253, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0190, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0152, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0202, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0157, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0165, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0146, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0174, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0109, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0144, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0254, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0219, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0121, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0279, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0193, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0162, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0215, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0144, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0112, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0222, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0132, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0137, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0118, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0126, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0230, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0140, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0222, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0127, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0201, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0138, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0127, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0161, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0116, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0228, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0189, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0148, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0156, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0118, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0107, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0293, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0104, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0167, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0168, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0136, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0206, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0109, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0217, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0091, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0222, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0240, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0213, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0194, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0233, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0173, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0193, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0235, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0309, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0271, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0199, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0282, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0180, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0116, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0270, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0104, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0290, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0254, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0272, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0152, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0110, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0091, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0197, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0284, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0151, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0128, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0195, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0236, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0164, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0085, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0263, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0194, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0120, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0186, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0244, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0361, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0184, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0204, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0207, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0217, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0214, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0281, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0142, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0277, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0186, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0210, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0146, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0126, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0312, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0263, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0214, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0166, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0199, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0117, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0258, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0150, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0109, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0154, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0100, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0173, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0308, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0165, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0216, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0142, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0128, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0088, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0199, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0142, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0147, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0264, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0149, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0309, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0172, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0080, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0185, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0135, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0088, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0183, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0328, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0151, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0098, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0159, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0166, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0203, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0235, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0164, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0131, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0268, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0234, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0198, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0197, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0137, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0161, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0212, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0144, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0177, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0159, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0214, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0219, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0165, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0183, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0149, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0133, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0168, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0169, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0199, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0152, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0111, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0138, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0123, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0228, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0143, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0149, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0178, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0348, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0105, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0179, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0183, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0194, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0301, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0158, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0091, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0249, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0178, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0123, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0159, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0225, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0096, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0176, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0200, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0268, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0098, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0156, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0230, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0085, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0134, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0239, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0218, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0179, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0183, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0162, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0171, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0119, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0163, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0171, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0243, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0207, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0171, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0091, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0220, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0085, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0213, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0280, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0095, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0166, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0088, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0109, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0179, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0130, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0230, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0162, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0184, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0115, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0215, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0205, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0144, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0234, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0246, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0183, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0256, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0167, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0150, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0090, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0236, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0022, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0175, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0237, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0173, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0103, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0363, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0149, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0237, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0222, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0220, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0144, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0203, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0218, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0113, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0138, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0174, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0097, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0162, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0196, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0157, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0248, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0338, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0242, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0129, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0237, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0124, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0182, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0109, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0147, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0187, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0178, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0142, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0172, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0173, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0118, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0123, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0252, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0205, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0292, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0242, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0175, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0109, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0139, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0222, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0163, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0127, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0171, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0140, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0228, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0113, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0132, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0111, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0022, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0089, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0214, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0248, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0179, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0252, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0187, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0129, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0140, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0145, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0160, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0101, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0183, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0205, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0132, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0148, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0087, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0170, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0187, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0117, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0185, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0101, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0112, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0152, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0153, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0108, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0188, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0154, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0139, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0243, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0092, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0265, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0182, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0196, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0126, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0111, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0176, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0111, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0171, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0193, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0242, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0211, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0132, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0104, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0302, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0181, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0083, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0274, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0220, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0180, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0128, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0164, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0258, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0147, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0116, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0282, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0174, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0350, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0169, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0171, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0113, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0231, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0195, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0167, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0230, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0159, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0109, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0210, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0148, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0172, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0222, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0146, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0115, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0215, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0171, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0155, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0224, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0222, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0112, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0159, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0158, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0157, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0247, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0101, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0098, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0202, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0149, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0169, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0217, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0177, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0187, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0152, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0205, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0128, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0158, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0212, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0172, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0138, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0191, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0214, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0112, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0080, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0138, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0281, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0249, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0143, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0212, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0241, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0195, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0116, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0135, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0127, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0121, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0215, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0228, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0126, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0148, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0242, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0175, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0210, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0204, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0177, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0181, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0157, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0158, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0134, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0300, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0171, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0208, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0130, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0153, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0234, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0122, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0136, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0319, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0204, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0141, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0166, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0330, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0314, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0169, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0178, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0192, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0208, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0221, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0192, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0125, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0085, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0152, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0113, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0222, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0225, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0210, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0181, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0182, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0180, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0194, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0157, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0147, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0096, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0019, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0199, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0122, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0278, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0219, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0182, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0093, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0300, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0158, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0086, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0170, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0254, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0233, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0116, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0104, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0125, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0172, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0134, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0117, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0148, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0219, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0157, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0168, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0193, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0221, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0089, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0229, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0196, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0024, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0145, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0251, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0018, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0218, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0133, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0097, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0144, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0277, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0148, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0173, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0230, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0160, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0156, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0171, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0337, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0197, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0110, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0107, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0106, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0175, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0125, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0208, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0276, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0228, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0166, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0185, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0093, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0175, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0214, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0212, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0263, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0222, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0104, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0157, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0215, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0245, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0110, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0168, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0171, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0090, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0228, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0144, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0193, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0166, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0138, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0261, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0229, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0138, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0200, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0093, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0221, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0084, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0165, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0182, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0191, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0286, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0241, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0194, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0137, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0093, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0292, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0163, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0174, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0148, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0193, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0228, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0187, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0097, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0112, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0232, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0218, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0136, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0186, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0196, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0171, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0166, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0119, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0119, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0233, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0196, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0136, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0409, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0160, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0108, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0121, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0287, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0107, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0176, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0251, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0157, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0180, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0160, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0116, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0233, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0143, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0159, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0100, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0217, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0175, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0105, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0140, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0217, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0087, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0178, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0130, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0134, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0125, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0230, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0222, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0150, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0253, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0155, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0193, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0184, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0229, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0215, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0184, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0244, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0208, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0124, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0172, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0183, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0164, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0174, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0159, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0298, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0208, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0160, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0128, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0121, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0142, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0160, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0223, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0204, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0115, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0105, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0114, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0136, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0220, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0315, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0216, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0094, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0086, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0356, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0201, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0176, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0278, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0150, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0167, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0135, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0160, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0143, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0235, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0206, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0242, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0086, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0113, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0198, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0215, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0212, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0253, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0134, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0219, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0294, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0095, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0269, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0166, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0148, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0170, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0107, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0111, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0209, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0097, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0231, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0130, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0120, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0249, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0372, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0231, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0142, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0272, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0385, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0239, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0179, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0252, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0318, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0139, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0162, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0205, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0270, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0177, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0106, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0172, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0239, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0193, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0108, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0153, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0156, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0155, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0250, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0222, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0124, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0085, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0166, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0196, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0137, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0094, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0182, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0160, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0196, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0159, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0327, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0093, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0130, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0130, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0235, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0172, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0275, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0131, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0143, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0150, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0120, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0344, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0177, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0105, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0181, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0115, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0208, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0134, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0216, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0193, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0115, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0178, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0091, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0174, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0195, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0201, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0125, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0260, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0188, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0228, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0195, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0128, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0233, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0177, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0114, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0252, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0122, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0243, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0142, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0212, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0275, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0285, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0121, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0089, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0101, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0134, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0257, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0214, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0100, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0240, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0118, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0177, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0162, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0140, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0125, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0119, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0099, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0194, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0294, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0151, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0238, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0148, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0199, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0217, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0193, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0080, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0147, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0135, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0236, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0248, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0190, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0206, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0211, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0188, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0166, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0148, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0258, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0166, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0158, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0088, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0160, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0169, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0151, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0181, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0210, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0168, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0146, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0159, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0151, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0168, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0167, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0150, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0185, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0213, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0131, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0130, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0181, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0217, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0189, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0210, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0196, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0146, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0165, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0295, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0191, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0262, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0220, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0150, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0117, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0173, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0216, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0126, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0112, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0248, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0160, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0156, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0155, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0191, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0258, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0179, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0084, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0194, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0180, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0239, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0204, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0114, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0101, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0102, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0097, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0181, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0236, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0116, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0093, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0089, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0142, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0162, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0223, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0165, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0193, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0116, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0256, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0164, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0263, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0178, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0146, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0215, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0199, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0205, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0139, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0193, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0110, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0119, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0178, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0214, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0178, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0119, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0181, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0208, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0175, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0111, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0187, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0186, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0184, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0102, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0210, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0112, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0080, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0433, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0116, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0214, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0208, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0188, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0316, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0165, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0085, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0104, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0215, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0102, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0120, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0230, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0117, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0112, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0098, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0084, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0312, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0145, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0134, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0223, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0123, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0126, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0142, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0113, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0185, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0193, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0165, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0246, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0104, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0254, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0147, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0148, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0088, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0117, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0239, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0160, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0129, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0118, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0273, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0093, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0197, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0123, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0098, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0095, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0124, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0124, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0143, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0118, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0130, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0182, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0096, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0102, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0173, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0094, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0181, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0278, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0185, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0173, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0132, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0249, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0200, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0208, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0132, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0229, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0140, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0152, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0268, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0245, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0101, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0183, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0155, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0242, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0380, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0115, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0161, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0173, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0171, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0272, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0100, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0190, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0237, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0231, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0159, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0168, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0169, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0123, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0263, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0145, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0248, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0244, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0145, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0122, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0144, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0131, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0206, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0167, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0179, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0162, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0203, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0279, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0339, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0227, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0222, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0144, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0147, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0163, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0191, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0145, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0367, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0224, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0249, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0165, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0288, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0186, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0209, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0127, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0358, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0189, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0168, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0185, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0098, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0270, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0117, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0146, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0193, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0174, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0089, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0122, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0204, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0149, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0200, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0294, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0126, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0114, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0138, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0159, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0215, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0091, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0148, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0322, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0204, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0258, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0178, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0138, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0224, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0158, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0166, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0288, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0192, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0090, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0236, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0113, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0136, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0167, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0255, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0234, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0116, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0215, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0219, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0107, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0267, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0242, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0145, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0193, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0167, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0221, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0192, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0096, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0178, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0167, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0081, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0115, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0180, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0172, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0178, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0148, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0177, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0128, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0136, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0165, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0238, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0277, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0212, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0155, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0284, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0152, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0178, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0197, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0165, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0124, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0217, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0130, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0164, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0174, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0151, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0167, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0177, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0134, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0261, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0183, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0188, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0101, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0142, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0148, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0104, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0148, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0167, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0173, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0166, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0201, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0225, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0190, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0223, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0247, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0185, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0235, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0106, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0246, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0109, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0235, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0163, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0127, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0116, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0129, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0227, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0129, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0136, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0099, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0093, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0287, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0283, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0211, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0185, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0220, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0098, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0177, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0183, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0203, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0135, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0203, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0220, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0136, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0366, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0081, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0219, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0135, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0180, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0150, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0125, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0165, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0096, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0105, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0081, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0236, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0117, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0138, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0229, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0225, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0269, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0164, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0146, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0126, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0494, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0088, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0300, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0201, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0221, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0138, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0111, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0210, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0216, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0095, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0239, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0261, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0210, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0238, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0214, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0127, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0273, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0170, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0154, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0109, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0272, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0181, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0094, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0219, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0194, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0159, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0204, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0166, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0199, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0223, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0147, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0123, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0225, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0175, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0108, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0199, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0131, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0197, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0225, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0287, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0118, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0149, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0126, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0233, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0097, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0231, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0227, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0155, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0088, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0144, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0210, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0168, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0085, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0195, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0134, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0269, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0108, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0268, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0107, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0155, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0128, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0134, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0087, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0335, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0217, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0097, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0086, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0244, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0158, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0120, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0197, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0146, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0190, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0196, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0159, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0094, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0158, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0370, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0130, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0109, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0189, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0146, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0145, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0198, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0149, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0147, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0126, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0200, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0303, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0225, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0249, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0226, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0155, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0212, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0116, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0109, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0137, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0150, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0174, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0095, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0164, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0201, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0352, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0110, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0262, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0110, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0197, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0201, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0182, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0226, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0212, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0230, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0180, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0249, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0279, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0160, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0258, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0240, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0138, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0122, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0225, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0229, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0291, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0130, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0122, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0099, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0155, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0175, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0216, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0161, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0142, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0125, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0084, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0184, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0195, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0160, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0143, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0149, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0226, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0155, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0128, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0193, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0189, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0173, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0296, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1849, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0202, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0094, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0213, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0132, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0232, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0126, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0183, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0257, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0251, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0211, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0264, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0110, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0239, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0145, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0211, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0195, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0201, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0230, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0186, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0112, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0197, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0379, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0142, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0142, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0168, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0128, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0080, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0221, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0133, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0131, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0177, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0132, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0167, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0144, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0123, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0117, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0162, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0163, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0162, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0115, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0124, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0123, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0145, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0206, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0159, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0131, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0279, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0193, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0219, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0105, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0158, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0154, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0160, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0149, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0144, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0172, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0144, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0206, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0150, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0220, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0304, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0204, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0152, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0109, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0284, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0230, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0138, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0286, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0136, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0146, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0286, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0106, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0130, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0150, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0171, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0161, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0164, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0216, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0170, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0285, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0188, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0201, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0184, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0090, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0136, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0200, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0176, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0198, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0109, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0082, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0133, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0157, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0205, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0107, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0183, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0166, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0141, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0125, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0080, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0166, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0170, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0124, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0186, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0212, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0147, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0171, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0081, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0222, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0192, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0224, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0224, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0133, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0218, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0183, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0312, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0106, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0189, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0179, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0155, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0087, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0173, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0111, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0225, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0194, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0215, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0188, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0211, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0176, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0135, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0131, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0160, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0155, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0116, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0290, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0111, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0361, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0122, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0267, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0276, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0265, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0178, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0291, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0215, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0127, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0189, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0199, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0172, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0143, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0144, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0134, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0208, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0198, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0133, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0194, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0224, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0182, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0122, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0106, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0204, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0095, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0125, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0322, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0098, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0163, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0167, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0206, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0230, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0117, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0225, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0223, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0213, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0171, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0233, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0157, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0204, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0137, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0177, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0133, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0125, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0112, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0200, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0197, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0140, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0126, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0098, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0153, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0110, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0181, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0116, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0092, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0145, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0330, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0112, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0203, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0245, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0183, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0112, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0163, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0084, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0209, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0227, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0087, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0267, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0125, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0246, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0098, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0130, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0213, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0197, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0115, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0144, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0115, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0120, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0112, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0114, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0220, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0177, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0133, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0193, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0128, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0266, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0157, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0175, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0084, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0139, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0149, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0180, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0165, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0190, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0203, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0286, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0206, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0160, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0127, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0197, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0278, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0206, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0220, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0231, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0175, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0189, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0204, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0257, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0170, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0143, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0129, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0165, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0239, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0199, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0122, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0156, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0150, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0211, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0249, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0202, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0139, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0184, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0284, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0135, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0161, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0319, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0084, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0123, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0163, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0178, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0211, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0185, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0136, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0187, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0149, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0315, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0233, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0143, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0148, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0146, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0353, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0124, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0226, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0203, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0140, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0170, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0251, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0125, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0109, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0136, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0156, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0208, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0145, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0259, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0297, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0107, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0183, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0114, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0148, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0257, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0128, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0103, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0201, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0162, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0108, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0183, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0246, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0234, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0082, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0291, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0118, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0166, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0122, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0189, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0239, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0207, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0170, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0172, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0241, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0120, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0177, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0120, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0174, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0313, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0281, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0170, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0269, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0179, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0179, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0098, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0111, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0103, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0207, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0181, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0090, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0082, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0126, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0179, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0141, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0313, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0145, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0196, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0093, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0211, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0159, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0194, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0253, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0201, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0182, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0101, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0147, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0213, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0243, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0103, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0295, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0178, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0172, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0201, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0102, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0088, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0184, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0092, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0197, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0157, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0081, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0116, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0296, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0133, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0302, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0106, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0215, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0177, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0320, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0173, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0156, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0171, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0196, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0195, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0209, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0189, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0257, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0135, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0172, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0146, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0264, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0228, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0142, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0140, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0153, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0171, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0172, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0176, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0165, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0223, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0127, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0158, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0195, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0175, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0344, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0164, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0191, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0404, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0205, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0140, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0288, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0155, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0192, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0175, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0126, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0093, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0207, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0243, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0233, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0108, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0108, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0129, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0115, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0161, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0250, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0188, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0176, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0133, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0087, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0168, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0112, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0146, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0161, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0194, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0141, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0225, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0080, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0166, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0148, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0263, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0146, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0147, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0201, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0176, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0136, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0236, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0195, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0084, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0146, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0128, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0145, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0165, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0080, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0166, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0190, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0130, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0130, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0303, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0224, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0121, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0265, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0126, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0305, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0157, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0081, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0126, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0114, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0226, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0099, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0278, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0137, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0236, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0117, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0166, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0291, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0185, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0086, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0149, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0269, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0128, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0139, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0216, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0080, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0184, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0212, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0127, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0136, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0122, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0242, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0206, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0289, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0238, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0129, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0258, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0216, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0138, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0187, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0245, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0124, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0162, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0147, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0173, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0258, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0158, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0196, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0108, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0104, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0129, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0202, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0109, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0160, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0232, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0243, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0105, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0099, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0137, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0108, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0162, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0121, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0241, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0176, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0152, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0200, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0258, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0172, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0175, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0238, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0180, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0180, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0207, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0151, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0180, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0248, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0156, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0101, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0319, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0178, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0118, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0145, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0173, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0261, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0092, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0181, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0240, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0251, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0123, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0125, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0110, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0201, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0209, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0087, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0144, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0118, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0192, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0169, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0150, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0227, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0148, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0189, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0127, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0150, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0127, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0148, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0194, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0154, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0145, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0089, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0085, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0149, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0151, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0210, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0236, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0098, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0275, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0246, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0096, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0269, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0211, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0099, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0200, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0148, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0189, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0265, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0220, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0223, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0349, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0110, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0120, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0123, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0178, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0216, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0190, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0088, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0146, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0211, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0094, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0334, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0252, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0142, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0135, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0318, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0080, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0167, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0164, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0144, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0167, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0219, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0206, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0175, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0214, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0170, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0203, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0221, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0197, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0184, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0131, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0114, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0225, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0312, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0139, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0111, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0164, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0116, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0196, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0116, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0159, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0193, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0217, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0201, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0333, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0212, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0199, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0092, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0149, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0186, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0172, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0220, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0165, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0202, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0233, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0141, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0116, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0181, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0229, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0209, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0167, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0169, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0270, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0152, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0102, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0122, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0215, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0090, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0286, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0124, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0261, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0203, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0242, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0107, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0080, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0183, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0275, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0143, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0085, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0116, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0275, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0113, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0100, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0220, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0396, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0253, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0113, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0255, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0110, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0119, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0241, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0148, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0160, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0198, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0249, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0111, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0366, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0208, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0188, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0257, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0137, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0127, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0149, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0237, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0179, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0125, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0098, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0204, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0232, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0148, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0267, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0165, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0097, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0108, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0328, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0218, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0139, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0125, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0125, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0127, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0256, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0188, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0143, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0088, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0124, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0119, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0221, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0172, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0359, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0225, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0219, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0178, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0088, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0178, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0270, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0123, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0119, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0135, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0281, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0096, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0171, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0099, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0260, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0130, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0161, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0251, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0204, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0104, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0179, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0181, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0161, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0245, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0181, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0247, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0081, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0118, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0193, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0175, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0281, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0216, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0209, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0120, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0319, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0111, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0083, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0304, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0223, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0144, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0172, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0328, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0215, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0083, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0211, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0171, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0108, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0158, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0167, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0254, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0172, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0262, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0227, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0110, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0187, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0132, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0266, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0115, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0163, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0118, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0149, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0186, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0240, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0143, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0173, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0186, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0175, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0129, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0099, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0094, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0183, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0206, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0269, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0197, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0099, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0244, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0101, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0171, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0154, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0182, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0094, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0220, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0201, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0159, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0150, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0150, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0125, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0193, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0083, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0120, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0211, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0224, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0123, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0094, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0197, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0261, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0202, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0097, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0168, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0158, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0165, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0166, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0178, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0161, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0275, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0173, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0235, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0137, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0202, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0214, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0153, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0157, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0118, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0159, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0253, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0154, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0100, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0099, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0223, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0218, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0134, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0230, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0164, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0171, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0216, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0199, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0134, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0116, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0198, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0146, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0138, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0177, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0235, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0229, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0125, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0213, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0164, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0227, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0126, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0175, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0138, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0273, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0157, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0191, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0195, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0200, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0270, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0184, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0085, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0335, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0085, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0184, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0216, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0183, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0085, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0107, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0189, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0152, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0231, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0190, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0118, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0127, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0195, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0170, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0310, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0108, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0153, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0180, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0136, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0204, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0110, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0093, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0229, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0125, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0289, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0083, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0134, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0129, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0113, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0083, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0121, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0172, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0199, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0123, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0289, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0127, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0133, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0116, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0108, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0244, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0128, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0283, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0124, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0233, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0176, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0210, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0157, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0118, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0211, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0238, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0111, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0228, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0135, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0228, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0185, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0159, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0182, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0168, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0209, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0124, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0145, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0150, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0322, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0119, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0189, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0167, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0121, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0210, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0138, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0151, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0096, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0100, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0139, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0081, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0114, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0230, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0273, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0185, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0087, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0106, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0087, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0178, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0118, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0246, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0190, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0109, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0144, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0241, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0100, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0116, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0123, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0192, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0235, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0137, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0184, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0177, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0247, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0130, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0110, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0222, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0202, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0145, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0310, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0118, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0090, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0111, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0126, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0133, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0230, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0172, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0228, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0161, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0080, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0174, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0096, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0183, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0188, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0237, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0113, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0183, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0126, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0144, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0166, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0204, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0094, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0096, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0183, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0242, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0230, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0309, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0225, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0127, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0169, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0110, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0223, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0122, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0287, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0202, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0218, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0216, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0176, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0133, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0100, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0200, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0103, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0139, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0235, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0241, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0172, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0214, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0197, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0188, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0219, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0239, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0219, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0147, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0228, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0184, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0268, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0210, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0295, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0182, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0216, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0259, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0205, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0158, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0204, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0087, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0272, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0235, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0115, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0149, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0092, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0186, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0104, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0295, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0250, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0151, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0131, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0174, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0131, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0356, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0126, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0169, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0179, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0352, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0228, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0095, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0144, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0135, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0175, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0214, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0298, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0110, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0187, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0121, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0192, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0364, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0179, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0166, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0119, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0162, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0172, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0141, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0139, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0250, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0214, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0131, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0143, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0193, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0169, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0182, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0204, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0217, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0204, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0185, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0164, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0227, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0093, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0086, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0161, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0203, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0216, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0126, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0156, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0145, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0147, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0119, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0111, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0221, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0145, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0207, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0194, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0143, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0332, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0204, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0124, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0189, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0248, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0157, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0109, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0226, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0207, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0133, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0145, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0156, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0097, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0133, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0204, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0162, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0198, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0115, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0128, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0135, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0222, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0087, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0179, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0133, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0125, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0168, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0134, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0166, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0106, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0209, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0263, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0142, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0157, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0106, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0156, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0154, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0173, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0124, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0116, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0146, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0121, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0115, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0240, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0184, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0239, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0185, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0220, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0106, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0136, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0018, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0189, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0200, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0181, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0137, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0144, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0175, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0229, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0167, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0171, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0233, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0199, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0286, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0175, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0186, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0227, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0219, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0183, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0248, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0154, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0167, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0154, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0144, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0191, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0275, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0116, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0152, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0217, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0166, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0125, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0090, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0118, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0160, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0159, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0106, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0343, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0192, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0082, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0136, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0120, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0162, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0155, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0341, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0108, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0106, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0186, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0149, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0254, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0233, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0084, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0274, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0140, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0177, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0189, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0203, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0104, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0175, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0151, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0218, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0238, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0120, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0239, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0158, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0269, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0087, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0257, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0204, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0241, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0120, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0193, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0091, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0201, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0097, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0195, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0203, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0131, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0131, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0155, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0183, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0131, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0218, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0164, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0094, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0171, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0153, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0198, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0148, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0092, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0272, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0181, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0191, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0178, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0100, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0179, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0123, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0321, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0213, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0222, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0172, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0107, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0092, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0104, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0117, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0126, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0106, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0166, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0186, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0109, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0354, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0121, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0112, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0172, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0172, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0221, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0139, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0183, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0222, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0129, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0255, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0118, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0156, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0104, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0224, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0118, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0181, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0189, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0166, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0191, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0159, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0139, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0255, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0080, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0162, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0169, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0175, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0233, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0177, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0208, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0104, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0118, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0115, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0180, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0214, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0164, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0196, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0138, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0136, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0101, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0184, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0146, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0124, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0199, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0199, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0169, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0100, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0120, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0085, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0187, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0136, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0282, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0188, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0251, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0080, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0352, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0271, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0133, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0206, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0154, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0171, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0317, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0175, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0183, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0193, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0096, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0161, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0223, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0136, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0016, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0168, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0131, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0164, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0198, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0082, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0277, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0150, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0195, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0157, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0210, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0174, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0217, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0081, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0095, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0127, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0272, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0214, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0176, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0314, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0149, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0201, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0190, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0111, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0133, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0201, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0131, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0116, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0255, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0140, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0107, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0082, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0171, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0192, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0092, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0197, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0221, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0179, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0178, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0092, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0295, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0354, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0253, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0234, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0268, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0198, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0119, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0156, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0122, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0125, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0164, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0152, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0165, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0088, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0096, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0283, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0225, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0204, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0107, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0197, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0146, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0171, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0106, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0131, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0168, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0161, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0271, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0127, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0176, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0164, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0196, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0228, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0182, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0179, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0192, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0176, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0208, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0159, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0110, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0092, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0102, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0163, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0126, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0183, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0180, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0112, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0107, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0204, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0165, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0194, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0159, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0249, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0127, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0309, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0158, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0202, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0141, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0088, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0141, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0176, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0229, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0199, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0266, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0081, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0143, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0167, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0253, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0255, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0178, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0252, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0136, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0293, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0126, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0213, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0136, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0149, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0231, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0168, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0273, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0136, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0191, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0229, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0144, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0158, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0238, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0324, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0131, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0174, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0289, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0124, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0220, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0173, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0158, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0109, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0180, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0238, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0116, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0208, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0125, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0177, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0171, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0127, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0137, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0103, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0110, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0167, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0133, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0249, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0121, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0164, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0137, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0211, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0093, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0140, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0163, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0095, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0312, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0133, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0178, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0111, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0118, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0203, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0255, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0265, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0157, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0185, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0201, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0125, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0208, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0129, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0145, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0098, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0258, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0122, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0256, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0235, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0147, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0222, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0093, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0183, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0202, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0183, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0209, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0257, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0018, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0102, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0228, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0104, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0120, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0105, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0186, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0099, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0227, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0124, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0156, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0150, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0128, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0225, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0218, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0138, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0257, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0091, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0307, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0142, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0190, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0184, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0117, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0101, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0111, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0224, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0178, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0262, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0182, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0133, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0211, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0125, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0177, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0269, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0254, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0170, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0170, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0185, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0183, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0317, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0165, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0114, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0117, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0195, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0134, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0140, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0261, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0135, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0090, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0149, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0293, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0192, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0153, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0115, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0119, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0317, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0261, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0111, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0156, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0205, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0080, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0262, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0149, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0137, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0136, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0233, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0096, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0249, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0087, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0096, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0105, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0127, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0153, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0279, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0132, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0129, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0195, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0113, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0232, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0208, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0184, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0101, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0210, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0328, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0148, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0174, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0151, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0260, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0308, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0218, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0285, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0102, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0192, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0106, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0294, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0172, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0218, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0152, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0159, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0099, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0108, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0219, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0209, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0244, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0170, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0220, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0190, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0210, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0241, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0122, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0088, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0157, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0159, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0169, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0127, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0138, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0169, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0191, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0254, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0211, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0095, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0244, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0170, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0221, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0277, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0213, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0125, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0221, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0249, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0122, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0168, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0124, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0200, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0157, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0110, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0173, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0146, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0263, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0102, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0192, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0125, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0149, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0088, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0345, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0119, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0126, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0198, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0117, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0110, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0184, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0143, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0169, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0233, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0101, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0216, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0162, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0128, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0134, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0173, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0159, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0252, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0112, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0143, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0249, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0197, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0234, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0113, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0165, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0125, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0105, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0093, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0164, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0158, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0100, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0013, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0205, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0246, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0181, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0222, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0150, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0164, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0131, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0216, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0377, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0178, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0123, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0127, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0118, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0159, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0230, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0150, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0267, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0146, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0099, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0134, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0180, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0224, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0116, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0192, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0196, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0113, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0150, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0163, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0013, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0303, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0101, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0232, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0147, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0155, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0225, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0162, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0239, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0254, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0116, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0105, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0138, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0163, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0155, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0133, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0299, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0171, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0096, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0277, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0094, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0137, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0083, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0277, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0120, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0217, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0166, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0198, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0196, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0182, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0113, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0154, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0346, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0221, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0153, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0148, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0165, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0205, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0113, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0146, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0150, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0290, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0188, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0093, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0236, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0260, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0160, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0220, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0187, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0197, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0144, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0286, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0151, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0182, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0243, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0086, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0081, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0125, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0145, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0209, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0204, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0199, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0162, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0117, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0165, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0097, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0200, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0163, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0232, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0320, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0182, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0203, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0185, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0226, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0195, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0139, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0170, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0122, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0334, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0191, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0126, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0165, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0209, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0082, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0094, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0298, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0274, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0086, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0231, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0170, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0259, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0157, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0135, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0161, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0172, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0282, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0135, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0204, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0169, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0154, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0110, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0239, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0184, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0144, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0136, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0127, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0136, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0220, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0212, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0183, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0160, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0098, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0185, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0011, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0254, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0141, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0227, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0160, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0122, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0183, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0121, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0193, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0092, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0415, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0108, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0245, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0184, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0231, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0260, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0230, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0231, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0129, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0214, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0155, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0133, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0094, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0365, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0140, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0165, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0104, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0216, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0218, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0101, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0219, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0250, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0138, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0145, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0012, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0225, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0098, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0201, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0252, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0126, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0128, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0289, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0125, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0224, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0275, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0190, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0222, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0198, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0105, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0137, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0099, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0159, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0169, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0211, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0238, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0088, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0088, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0255, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0260, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0225, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0254, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0333, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0137, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0160, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0098, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0128, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0220, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0131, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0098, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0207, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0191, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0087, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0155, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0201, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0167, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0174, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0189, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0245, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0167, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0208, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0238, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0190, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0011, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0151, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0153, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0164, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0151, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0176, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0188, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0182, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0106, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0233, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0182, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0085, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0011, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0125, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0167, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0156, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0179, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0222, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0110, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0211, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0083, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0138, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0145, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0159, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0190, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0134, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0194, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0258, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0155, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0011, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0163, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0256, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0117, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0108, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0223, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0159, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0132, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0153, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0199, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0126, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0136, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0098, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0219, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0134, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0111, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0087, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0206, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0131, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0087, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0127, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0227, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0170, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0336, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0090, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0209, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0089, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0143, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0220, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0081, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0182, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0126, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0209, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0091, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0205, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0180, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0166, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0220, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0122, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0179, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0274, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0119, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0168, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0134, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0143, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0195, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0138, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0205, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0300, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0134, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0117, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0148, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0193, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0214, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0132, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0164, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0181, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0216, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0254, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0171, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0143, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0187, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0228, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0179, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0141, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0215, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0244, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0183, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0172, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0155, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0093, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0126, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0167, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0163, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0162, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0278, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0098, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0094, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0163, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0120, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0192, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0235, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0249, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0204, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0156, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0225, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0116, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0148, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0252, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0095, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0168, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0127, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0245, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0293, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0285, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0216, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0152, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0282, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0185, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0092, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0183, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0093, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0118, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0159, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0254, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0093, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0095, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0189, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0158, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0253, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0142, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0205, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0256, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0275, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0174, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0197, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0124, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0222, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0216, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0140, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0281, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0181, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0201, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0134, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0144, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0288, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0232, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0136, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0150, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0274, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0244, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0187, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0103, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0021, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0120, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0117, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0266, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0158, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0119, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0192, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0241, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0245, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0248, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0158, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0161, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0137, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0119, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0109, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0166, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0225, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0103, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0179, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0154, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0165, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0213, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0199, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0153, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0131, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0163, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0285, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0222, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0081, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0128, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0196, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0222, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0176, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0117, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0093, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0164, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0267, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0199, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0113, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0195, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0091, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0118, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0224, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0117, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0115, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0142, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0216, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0083, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0093, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0135, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0245, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0138, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0171, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0190, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0120, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0245, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0129, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0209, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0251, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0149, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0119, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0135, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0199, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0154, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0171, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0113, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0186, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0272, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0222, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0133, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0176, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0189, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0315, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0196, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0242, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0115, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0114, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0295, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0249, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0159, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0194, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0142, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0169, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0124, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0110, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0105, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0158, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0132, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0246, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0088, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0129, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0312, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0227, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0343, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0090, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0131, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0184, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0228, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0205, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0258, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0125, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0297, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0189, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0120, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0160, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0133, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0208, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0219, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0202, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0166, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0166, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0205, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0166, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0166, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0115, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0100, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0306, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0149, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0159, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0219, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0099, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0089, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0098, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0122, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0216, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0265, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0083, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0180, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0106, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0243, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0226, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0213, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0193, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0166, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0091, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0113, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0173, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0277, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0267, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0231, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0156, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0169, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0232, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0080, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0393, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0115, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0192, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0092, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0200, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0147, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0166, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0106, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0196, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0173, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0086, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0171, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0146, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0221, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0211, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0186, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0127, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0178, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0177, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0100, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0235, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0276, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0194, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0258, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0190, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0155, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0144, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0197, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0182, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0152, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0122, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0139, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0146, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0097, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0195, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0176, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0142, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0106, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0211, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0253, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0175, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0162, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0175, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0255, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0119, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0162, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0179, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0085, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0237, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0181, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0234, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0151, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0187, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0161, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0174, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0107, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0141, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0172, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0100, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0100, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0161, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0203, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0209, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0230, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0169, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0179, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0080, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0231, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0166, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0201, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0169, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0118, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0188, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0281, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0156, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0160, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0258, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0133, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0157, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0270, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0210, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0174, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0203, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0173, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0164, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0231, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0096, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0222, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0116, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0255, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0183, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0135, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0183, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0150, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0148, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0131, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0192, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0240, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0122, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0260, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0201, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0139, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0209, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0221, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0092, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0238, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0214, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0138, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0146, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0087, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0138, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0190, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0171, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0189, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0132, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0153, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0191, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0110, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0213, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0163, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0101, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0120, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0171, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0119, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0113, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0132, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0245, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0135, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0181, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0187, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0116, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0187, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0236, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0154, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0211, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0271, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0170, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0164, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0142, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0104, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0110, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0135, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0184, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0234, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0117, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0142, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0227, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0131, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0108, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0149, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0136, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0131, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0231, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0400, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0162, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0237, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0198, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0084, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0085, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0120, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0145, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0170, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0197, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0161, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0219, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0124, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0146, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0087, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0162, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0119, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0216, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0142, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0140, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0239, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0274, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0192, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0166, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0187, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0197, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0274, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0259, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0116, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0118, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0186, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0114, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0232, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0129, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0179, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0108, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0108, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0083, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0138, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0173, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0124, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0119, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0252, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0285, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0235, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0111, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0272, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0105, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0234, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0113, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0311, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0162, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0210, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0284, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0209, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0182, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0180, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0200, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0151, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0231, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0164, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0190, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0142, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0267, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0205, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0302, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0095, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0140, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0193, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0235, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0156, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0152, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0166, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0221, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0184, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0222, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0216, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0286, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0176, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0146, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0194, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0136, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0116, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0143, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0131, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0152, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0197, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0203, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0271, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0191, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0147, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0297, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0161, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0179, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0219, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0194, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0129, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0144, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0188, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0160, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0135, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0186, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0113, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0153, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0168, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0215, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0182, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0148, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0183, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0187, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0159, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0114, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0207, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0205, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0146, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0117, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0103, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0162, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0124, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0095, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0131, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0225, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0128, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0139, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0095, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0184, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0110, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0165, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0172, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0151, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0176, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0247, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0179, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0178, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0124, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0167, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0171, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0257, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0215, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0209, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0147, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0169, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0166, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0112, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0157, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0215, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0213, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0165, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0170, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0120, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0162, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0136, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0163, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0174, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0189, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0128, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0160, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0172, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0191, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0215, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0221, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0239, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0179, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0156, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0214, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0167, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0233, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0152, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0140, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0235, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0127, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0137, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0175, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0202, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0149, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0128, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0221, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0104, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0152, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0196, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0127, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0126, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0158, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0160, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0090, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0191, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0106, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0115, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0233, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0229, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0161, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0164, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0243, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0129, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0162, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0224, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0152, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0107, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0160, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0109, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0127, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0227, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0125, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0132, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0202, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0178, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0144, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0143, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0217, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0158, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0183, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0199, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0133, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0138, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0186, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0113, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0177, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0165, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0117, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0244, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0171, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0225, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0130, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0226, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0177, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0176, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0242, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0183, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0142, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0192, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0102, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0219, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0143, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0127, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0167, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0188, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0221, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0157, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0192, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0164, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0176, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0128, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0139, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0143, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0186, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0110, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0200, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0139, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0108, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0146, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0160, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0144, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0248, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0154, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0168, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0226, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0166, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0160, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0148, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0209, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0130, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0083, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0184, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0129, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0115, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0262, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0155, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0172, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0270, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0159, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0094, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0182, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0203, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0148, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0179, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0210, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0129, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0132, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0179, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0124, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0096, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0112, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0144, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0129, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0227, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0171, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0115, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0189, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0227, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0205, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0155, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0180, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0223, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0140, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0111, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0172, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0187, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0167, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0135, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0215, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0143, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0155, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0181, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0182, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0167, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0221, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0153, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0260, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0161, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0090, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0085, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0153, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0149, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0207, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0195, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0206, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0118, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0139, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0235, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0175, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0190, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0148, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0140, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0333, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0149, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0107, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0162, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0139, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0117, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0310, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0148, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0195, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0135, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0128, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0146, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0137, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0158, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0154, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0116, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0156, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0147, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0081, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0100, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0112, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0259, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0146, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0192, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0150, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0225, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0208, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0236, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0218, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0081, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0291, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0152, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0257, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0160, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0194, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0543, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0193, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0171, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0209, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0154, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0125, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0089, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0180, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0228, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0249, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0203, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0276, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0107, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0180, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0246, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0267, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0141, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0147, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0129, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0274, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0152, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0135, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0146, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0161, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0132, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0087, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0181, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0147, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0146, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0175, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0140, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0217, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0143, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0133, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0150, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0215, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0187, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0118, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0206, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0132, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0179, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0150, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0264, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0220, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0165, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0253, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0093, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0189, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0186, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0287, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0132, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0302, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0133, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0159, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0226, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0288, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0147, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0279, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0241, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0172, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0188, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0168, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0114, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0227, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0107, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0203, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0095, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0173, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0168, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0118, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0206, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0362, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0156, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0197, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0115, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0138, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0102, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0277, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0170, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0188, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0096, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0259, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0113, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0106, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0144, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0140, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0233, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0161, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0089, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0123, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0318, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0144, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0171, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0173, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0253, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0122, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0128, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0221, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0191, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0116, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0150, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0118, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0154, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0153, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0273, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0226, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0107, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0158, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0156, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0248, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0162, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0201, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0182, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0253, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0129, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0206, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0163, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0184, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0155, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0223, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0198, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0119, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0112, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0259, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0133, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0164, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0154, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0176, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0105, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0192, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0135, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0181, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0103, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0138, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0136, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0167, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0160, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0215, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0163, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0171, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0210, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0156, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0166, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0113, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0121, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0155, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0137, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0117, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0208, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0113, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0158, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0148, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0301, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0156, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0153, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0215, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0206, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0258, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0158, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0195, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0199, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0191, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0097, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0201, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0170, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0153, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0136, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0275, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0182, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0190, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0090, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0131, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0087, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0176, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0146, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0259, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0203, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0102, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0099, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0110, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0197, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0233, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0205, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0209, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0232, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0159, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0203, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0146, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0126, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0134, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0095, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0228, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0130, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0161, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0205, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0189, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0185, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0151, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0140, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0103, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0152, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0118, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0111, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0163, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0209, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0177, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0192, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0221, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0113, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0248, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0128, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0115, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0082, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0101, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0129, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0252, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0209, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0111, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0156, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0090, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0204, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0140, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0220, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0105, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0142, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0206, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0226, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0101, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0190, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0133, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0226, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0220, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0173, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0187, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0273, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0148, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0238, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0157, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0199, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0148, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0114, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0207, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0118, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0115, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0141, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0204, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0177, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0123, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0265, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0135, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0122, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0190, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0132, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0227, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0196, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0190, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0160, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0181, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0187, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0191, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0146, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0229, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0110, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0118, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0228, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0133, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0250, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0223, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0157, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0145, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0120, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0209, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0116, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0254, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0101, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0095, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0158, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0209, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0165, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0141, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0081, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0181, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0259, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0260, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0182, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0228, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0189, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0162, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0174, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0200, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0150, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0129, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0203, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0116, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0105, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0132, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0226, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0154, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0160, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0243, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0157, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0188, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0157, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0108, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0234, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0173, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0158, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0208, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0115, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0173, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0177, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0212, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0156, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0129, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0231, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0199, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0168, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0212, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0162, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0143, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0098, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0262, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0151, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0131, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0162, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0208, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0129, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0113, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0164, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0191, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0203, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0144, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0212, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0189, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0194, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0123, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0177, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0195, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0167, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0207, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0274, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0099, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0194, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0265, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0199, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0196, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0145, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0202, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0243, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0198, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0140, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0229, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0131, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0121, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0199, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0240, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0142, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0083, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0217, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0128, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0119, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0136, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0115, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0110, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0124, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0138, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0138, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0176, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0146, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0138, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0239, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0119, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0175, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0144, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0195, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0157, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0127, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0241, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0154, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0277, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0115, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0130, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0120, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0252, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0113, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0214, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0189, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0232, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0202, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0113, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0127, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0170, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0152, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0234, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0126, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0211, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0180, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0212, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0256, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0153, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0145, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0168, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0201, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0174, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0151, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0141, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0305, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0132, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0179, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0170, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0109, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0108, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0103, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0176, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0095, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0183, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0273, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0142, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0115, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0270, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0113, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0129, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0155, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0178, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0226, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0146, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0279, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0180, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0217, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0138, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0121, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0289, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0180, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0136, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0247, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0156, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0139, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0336, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0185, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0158, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0145, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0177, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0173, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0147, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0172, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0185, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0121, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0141, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0172, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0277, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0184, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0163, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0236, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0145, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0150, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0189, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0160, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0201, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0131, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0152, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0139, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0169, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0171, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0222, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0221, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0158, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0128, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0166, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0186, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0163, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0193, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0178, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0207, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0161, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0126, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0137, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0219, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0110, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0111, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0170, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0112, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0123, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0150, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0105, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0111, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0192, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0123, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0175, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0116, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0218, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0251, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0153, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0224, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0158, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0157, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0114, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0218, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0185, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0134, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0195, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0230, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0141, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0119, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0176, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0093, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0166, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0154, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0143, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0108, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0182, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0201, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0236, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0225, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0162, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0108, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0116, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0174, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0154, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0142, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0111, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0145, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0127, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0188, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0188, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0230, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0123, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0151, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0143, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0180, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0148, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0164, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0142, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0185, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0142, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0117, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0100, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0204, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0253, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0148, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0176, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0121, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0285, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0331, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0139, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0108, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0146, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0135, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0142, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0192, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0198, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0140, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0119, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0155, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0178, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0102, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0207, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0164, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0146, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0180, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0287, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0231, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0227, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0291, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0159, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0200, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0091, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0178, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0116, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0129, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0297, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0159, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0214, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0116, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0129, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0274, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0172, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0167, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0154, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0204, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0179, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0204, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0090, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0144, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0144, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0259, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0225, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0106, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0148, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0192, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0122, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0175, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0130, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0097, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0126, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0129, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0130, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0201, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0159, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0169, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0166, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0220, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0112, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0132, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0191, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0148, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0093, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0253, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0123, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0116, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0278, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0231, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0190, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0195, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0152, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0200, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0154, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0176, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0164, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0218, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0186, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0120, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0112, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0144, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0163, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0253, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0147, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0128, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0217, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0250, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0269, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0151, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0219, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0232, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0219, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0264, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0171, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0208, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0190, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0121, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0169, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0181, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0156, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0167, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0175, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0259, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0147, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0173, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0219, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0111, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0152, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0154, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0212, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0103, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0155, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0170, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0225, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0176, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0147, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0153, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0151, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0276, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0250, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0155, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0159, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0218, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0136, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0126, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0136, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0149, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0213, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0127, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0172, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0127, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0206, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0162, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0255, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0138, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0124, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0159, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0244, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0153, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0145, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0156, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0145, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0169, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0245, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0124, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0090, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0153, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0228, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0140, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0161, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0113, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0135, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0123, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0158, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0231, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0212, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0151, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0293, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0342, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0131, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0203, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0184, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0165, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0098, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0147, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0201, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0221, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0153, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0234, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0147, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0230, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0227, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0174, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0086, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0169, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0279, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0255, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0191, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0241, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0175, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0194, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0139, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0200, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0099, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0132, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0201, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0236, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0130, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0178, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0244, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0185, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0176, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0109, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0183, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0156, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0154, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0089, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0148, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0127, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0120, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0238, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0215, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0229, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0158, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0093, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0246, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0207, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0112, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0140, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0234, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0131, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0137, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0268, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0091, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0132, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0112, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0167, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0156, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0170, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0131, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0170, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0214, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0138, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0238, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0171, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0127, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0094, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0192, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0257, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0249, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0146, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0245, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0132, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0125, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0128, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0154, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0176, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0151, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0209, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0134, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0122, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0174, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0122, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0085, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0089, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0221, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0140, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0099, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0250, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0223, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0125, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0288, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0164, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0172, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0142, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0183, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0163, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0141, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0172, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0197, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0155, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0246, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0137, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0180, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0149, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0196, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0128, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0164, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0164, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0119, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0143, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0132, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0091, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0081, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0241, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0179, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0194, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0188, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0160, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0135, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0222, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0181, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0174, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0160, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0145, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0113, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0173, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0156, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0122, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0219, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0194, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0240, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0097, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0189, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0133, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0170, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0177, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0099, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0087, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0126, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0147, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0186, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0204, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0175, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0144, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0139, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0185, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0258, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0130, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0212, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0179, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0110, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0184, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0175, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0102, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0228, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0247, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0122, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0141, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0132, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0207, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0115, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0248, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0253, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0166, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0132, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0190, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0142, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0108, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0116, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0157, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0260, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0172, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0212, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0258, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0143, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0119, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0209, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0339, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0138, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0215, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0142, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0183, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0206, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0253, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0154, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0168, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0081, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0159, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0183, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0124, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0203, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0083, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0229, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0193, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0209, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0196, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0126, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0157, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0124, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0149, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0168, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0232, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0111, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0329, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0160, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0115, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0129, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0202, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0149, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0183, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0138, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0140, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0178, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0183, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0177, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0146, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0107, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0138, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0186, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0223, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0210, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0200, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0114, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0220, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0184, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0156, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0198, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0160, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0225, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0123, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0244, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0201, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0131, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0204, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0219, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0250, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0158, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0190, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0192, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0101, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0256, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0205, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0214, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0098, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0125, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0228, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0207, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0181, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0193, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0160, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0163, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0093, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0279, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0137, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0149, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0144, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0145, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0221, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0165, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0152, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0080, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0159, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0171, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0249, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0217, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0125, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0166, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0115, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0133, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0171, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0176, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0128, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0174, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0371, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0123, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0126, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0116, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0150, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0090, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0180, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0169, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0165, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0203, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0233, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0217, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0142, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0165, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0111, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0184, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0176, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0158, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0104, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0142, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0220, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0088, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0176, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0203, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0119, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0190, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0135, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0153, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0219, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0177, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0273, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0266, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0101, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0263, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0195, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0165, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0199, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0184, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0108, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0124, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0208, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0133, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0154, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0236, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0248, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0134, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0251, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0146, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0279, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0248, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0215, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0141, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0119, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0145, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0091, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0193, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0126, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0151, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0175, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0140, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0149, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0231, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0245, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0232, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0175, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0165, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0189, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0251, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0179, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0154, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0188, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0212, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0152, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0308, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0135, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0141, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0181, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0156, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0136, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0153, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0248, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0200, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0107, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0133, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0089, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0189, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0082, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0099, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0120, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0169, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0113, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0143, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0085, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0120, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0244, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0194, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0109, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0169, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0103, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0126, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0133, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0120, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0169, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0174, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0125, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0218, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0195, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0151, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0131, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0191, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0195, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0196, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0147, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0142, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0299, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0113, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0146, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0143, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0086, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0160, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0082, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0092, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0162, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0098, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0170, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0187, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0160, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0105, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0128, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0104, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0262, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0114, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0203, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0135, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0358, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0119, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0243, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0185, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0152, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0182, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0196, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0176, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0158, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0083, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0136, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0183, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0131, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0145, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0176, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0108, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0156, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0187, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0100, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0152, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0148, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0152, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0299, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0344, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0113, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0175, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0125, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0232, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0189, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0223, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0191, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0204, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0184, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0129, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0107, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0128, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0258, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0104, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0228, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0208, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0127, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0130, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0167, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0118, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0184, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0157, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0157, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0120, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0123, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0134, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0214, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0155, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0096, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0091, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0167, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0133, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0197, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0111, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0139, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0272, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0193, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0139, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0113, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0102, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0172, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0104, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0284, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0142, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0112, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0208, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0135, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0116, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0229, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0152, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0285, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0164, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0232, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0228, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0136, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0167, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0101, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0086, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0234, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0200, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0173, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0145, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0132, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0178, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0145, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0179, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0108, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0139, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0111, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0113, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0197, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0242, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0175, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0157, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0158, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0221, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0186, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0124, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0156, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0147, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0273, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0108, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0233, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0200, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0087, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0224, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0117, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0206, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0205, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0217, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0281, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0211, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0101, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0265, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0205, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0141, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0191, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0247, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0266, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0210, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0088, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0177, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0150, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0238, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0162, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0129, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0130, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0139, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0214, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0173, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0142, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0225, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0138, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0106, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0197, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0107, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0229, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0186, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0189, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0132, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0197, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0106, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0230, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0224, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0174, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0151, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0229, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0150, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0091, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0191, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0126, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0120, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0210, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0195, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0121, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0132, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0251, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0124, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0195, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0171, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0148, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0225, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0193, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0169, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0154, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0085, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0120, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0273, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0087, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0165, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0131, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0161, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0115, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0187, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0136, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0144, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0206, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0116, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0197, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0210, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0107, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0238, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0201, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0087, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0181, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0210, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0197, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0254, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0196, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0239, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0169, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0081, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0169, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0087, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0179, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0157, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0130, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0154, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0193, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0178, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0131, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0113, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0113, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0189, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0152, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0156, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0201, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0213, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0202, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0115, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0300, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0158, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0142, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0234, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0186, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0238, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0219, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0198, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0153, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0100, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0127, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0208, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0144, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0198, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0130, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0235, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0142, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0143, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0111, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0209, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0167, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0129, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0139, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0158, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0187, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0198, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0140, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0126, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0271, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0210, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0216, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0146, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0167, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0193, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0178, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0167, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0096, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0164, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0097, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0125, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0268, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0140, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0131, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0203, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0219, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0148, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0209, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0231, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0116, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0242, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0165, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0107, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0091, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0186, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0157, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0238, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0089, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0136, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0142, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0153, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0196, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0210, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0275, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0158, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0119, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0148, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0193, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0166, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0133, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0296, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0120, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0180, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0183, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0197, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0185, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0191, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0143, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0226, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0111, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0173, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0284, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0138, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0148, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0152, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0114, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0166, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0136, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0163, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0129, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0187, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0129, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0147, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0113, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0168, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0113, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0156, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0106, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0221, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0162, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0095, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0139, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0169, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0186, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0206, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0160, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0144, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0116, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0147, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0121, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0129, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0168, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0116, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0181, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0219, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0167, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0317, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0205, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0142, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0173, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0136, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0295, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0157, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0223, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0120, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0253, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0184, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0220, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0205, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0210, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0187, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0246, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0209, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0121, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0087, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0140, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0139, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0129, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0116, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0206, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0106, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0162, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0210, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0169, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0189, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0215, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0172, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0133, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0107, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0154, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0119, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0275, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0119, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0216, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0173, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0120, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0127, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0154, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0177, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0171, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0147, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0152, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0195, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0308, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0273, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0169, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0123, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0161, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0134, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0124, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0166, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0198, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0120, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0138, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0159, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0161, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0222, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0123, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0179, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0158, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0217, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0133, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0215, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0268, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0192, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0090, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0112, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0221, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0141, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0158, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0134, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0141, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0228, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0198, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0228, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0297, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0209, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0287, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0161, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0214, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0226, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0262, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0146, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0182, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0195, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0167, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0187, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0161, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0145, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0211, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0230, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0111, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0103, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0171, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0251, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0199, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0165, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0166, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0103, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0133, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0175, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0083, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0217, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0226, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0229, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0118, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0278, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0112, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0218, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0201, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0147, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0143, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0201, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0111, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0127, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0168, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0108, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0189, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0180, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0139, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0204, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0184, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0205, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0237, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0134, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0087, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0164, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0205, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0083, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0177, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0184, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0170, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0107, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0113, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0089, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0153, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0176, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0149, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0124, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0099, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0119, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0236, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0160, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0226, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0188, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0132, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0209, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0186, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0115, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0281, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0145, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0169, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0097, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0169, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0125, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0190, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0187, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0175, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0140, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0152, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0140, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0117, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0225, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0300, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0243, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0084, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0150, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0132, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0176, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0129, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0116, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0103, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0192, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0310, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0231, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0223, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0229, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0120, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0103, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0194, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0169, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0118, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0146, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0105, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0120, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0134, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0083, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0254, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0102, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0093, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0136, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0294, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0168, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0081, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0114, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0141, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0148, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0151, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0187, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0214, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0084, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0236, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0209, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0232, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0154, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0160, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0267, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0217, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0191, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0119, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0230, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0136, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0127, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0199, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0214, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0165, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0256, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0184, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0179, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0096, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0191, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0145, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0117, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0111, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0089, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0134, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0117, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0196, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0260, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0119, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0190, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0157, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0240, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0192, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0083, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0202, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0236, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0121, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0184, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0184, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0192, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0214, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0132, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0156, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0366, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0182, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0215, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0104, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0208, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0277, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0116, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0105, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0203, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0192, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0201, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0206, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0129, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0124, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0148, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0147, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0145, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0167, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0099, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0104, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0173, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0136, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0185, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0107, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0145, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0114, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0150, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0244, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0161, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0198, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0223, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0210, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0188, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0154, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0143, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0090, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0128, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0158, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0132, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0094, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0185, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0255, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0136, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0137, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0222, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0140, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0131, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0126, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0118, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0140, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0153, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0163, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0239, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0173, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0265, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0271, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0166, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0125, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0216, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0085, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0249, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0149, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0191, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0189, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0209, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0154, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0177, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0174, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0229, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0111, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0274, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0155, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0165, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0187, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0236, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0152, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0211, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0180, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0209, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0240, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0094, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0103, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0149, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0294, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0160, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0167, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0161, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0201, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0180, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0119, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0237, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0114, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0085, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0193, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0152, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0214, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0149, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0178, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0154, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0135, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0106, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0136, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0120, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0138, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0115, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0210, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0225, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0183, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0115, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0193, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0239, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0162, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0219, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0178, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0189, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0108, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0129, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0165, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0264, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0183, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0114, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0107, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0206, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0204, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0137, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0234, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0189, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0241, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0288, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0116, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0218, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0208, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0164, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0147, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0196, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0155, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0153, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0237, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0159, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0084, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0155, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0319, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0180, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0175, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0147, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0203, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0105, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0164, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0168, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0137, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0246, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0253, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0200, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0092, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0160, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0257, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0185, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0196, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0105, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0167, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0239, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0200, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0156, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0229, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0217, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0262, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0131, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0103, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0220, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0083, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0123, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0129, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0256, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0224, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0139, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0236, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0214, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0188, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0188, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0258, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0137, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0251, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0175, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0177, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0164, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0189, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0188, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0199, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0118, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0238, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0122, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0257, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0163, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0191, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0138, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0162, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0080, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0116, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0235, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0132, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0124, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0217, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0206, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0121, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0135, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0233, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0173, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0178, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0148, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0112, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0086, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0176, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0222, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0212, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0186, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0082, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0162, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0098, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0180, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0221, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0267, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0220, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0132, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0182, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0158, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0203, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0260, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0265, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0160, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0232, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0153, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0130, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0102, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0207, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0269, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0246, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0191, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0138, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0245, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0198, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0170, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0187, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0121, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0171, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0142, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0280, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0188, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0153, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0248, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0155, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0085, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0119, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0186, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0163, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0205, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0197, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0228, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0208, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0253, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0207, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0167, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0210, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0136, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0178, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0240, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0142, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0158, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0182, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0200, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0155, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0162, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0284, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0238, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0124, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0114, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0307, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0137, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0289, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0146, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0128, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0180, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0180, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0205, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0263, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0239, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0178, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0153, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0198, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0179, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0151, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0199, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0122, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0179, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0153, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0152, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0185, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0208, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0114, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0114, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0225, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0105, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0112, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0131, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0147, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0170, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0210, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0152, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0216, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0160, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0210, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0192, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0157, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0155, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0302, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0192, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0135, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0219, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0235, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0244, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0216, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0226, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0187, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0202, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0111, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0219, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0205, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0139, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0180, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0146, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0228, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0229, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0142, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0207, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0170, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0091, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0135, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0228, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0162, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0198, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0110, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0173, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0197, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0119, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0167, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0114, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0248, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0152, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0116, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0181, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0244, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0166, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0137, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0115, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0178, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0176, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0244, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0213, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0147, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0107, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0197, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0240, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0179, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0192, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0236, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0222, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0203, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0190, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0159, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0205, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0114, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0113, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0207, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0190, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0229, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0112, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0154, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0144, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0146, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0124, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0164, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0209, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0164, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0166, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0101, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0147, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0146, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0181, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0120, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0083, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0104, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0130, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0278, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0125, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0160, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0303, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0149, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0198, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0168, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0122, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0155, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0096, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0218, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0148, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0150, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0186, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0167, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0185, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0117, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0150, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0105, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0101, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0182, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0110, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0120, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0160, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0223, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0116, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0198, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0180, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0137, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0138, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0298, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0201, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0181, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0167, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0106, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0136, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0173, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0091, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0195, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0311, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0130, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0139, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0098, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0187, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0154, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0198, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0151, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0133, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0114, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0180, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0199, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0112, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0122, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0122, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0215, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0180, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0123, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0105, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0182, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0149, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0130, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0180, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0149, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0231, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0120, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0148, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0233, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0230, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0157, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0226, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0205, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0376, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0136, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0149, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0235, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0179, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0144, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0185, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0195, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0127, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0146, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0187, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0127, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0116, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0222, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0193, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0240, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0143, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0155, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0181, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0197, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0190, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0152, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0127, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0161, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0170, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0108, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0192, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0163, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0129, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0170, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0133, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0141, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0166, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0249, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0203, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0159, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0251, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0283, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0194, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0128, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0104, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0103, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0259, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0241, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0110, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0182, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0208, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0249, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0183, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0146, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0189, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0157, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0195, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0190, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0176, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0102, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0216, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0219, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0249, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0192, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0206, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0146, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0160, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0180, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0132, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0168, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0082, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0326, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0199, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0217, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0253, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0130, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0220, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0189, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0215, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0145, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0120, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0136, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0123, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0143, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0132, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0121, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0130, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0123, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0240, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0115, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0133, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0163, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0236, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0158, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0265, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0301, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0188, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0155, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0108, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0275, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0189, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0171, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0211, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0170, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0140, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0208, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0149, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0155, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0122, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0233, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0237, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0247, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0166, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0146, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0141, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0211, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0200, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0152, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0173, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0190, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0157, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0159, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0356, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0163, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0205, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0126, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0288, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0175, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0103, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0160, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0197, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0237, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0467, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0114, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0241, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0107, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0097, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0204, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0187, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0228, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0160, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0200, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0154, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0199, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0130, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0100, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0199, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0168, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0210, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0219, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0206, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0119, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0142, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0157, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0301, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0127, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0274, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0102, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0155, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0142, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0148, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0129, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0128, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0176, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0278, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0237, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0194, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0160, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0196, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0151, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0362, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0153, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0167, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0146, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0199, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0293, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0134, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0127, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0207, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0123, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0174, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0171, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0173, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0172, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0162, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0157, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0107, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0103, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0096, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0139, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0134, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0187, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0210, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0140, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0158, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0225, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0121, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0179, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0169, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0161, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0093, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0166, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0186, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0184, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0217, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0162, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0183, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0245, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0297, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0270, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0156, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0134, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0200, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0176, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0151, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0138, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0165, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0180, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0164, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0157, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0108, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0227, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0195, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0184, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0201, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0207, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0183, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0081, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0183, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0113, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0137, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0265, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0193, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0171, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0097, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0160, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0180, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0108, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0146, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0252, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0143, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0268, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0153, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0173, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0243, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0139, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0217, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0109, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0157, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0235, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0118, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0136, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0145, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0144, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0161, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0145, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0189, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0162, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0153, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0196, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0088, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0177, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0173, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0134, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0149, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0146, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0208, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0111, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0103, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0139, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0177, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0197, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0222, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0135, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0142, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0223, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0249, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0103, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0252, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0184, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0187, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0246, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0199, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0183, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0164, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0151, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0182, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0154, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0099, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0210, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0130, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0175, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0106, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0173, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0237, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0142, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0218, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0087, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0223, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0134, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0111, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0132, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0144, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0181, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0119, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0166, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0187, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0178, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0127, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0241, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0112, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0117, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0228, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0105, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0232, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0186, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0188, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0144, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0209, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0192, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0194, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0191, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0098, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0172, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0139, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0122, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0128, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0214, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0262, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0176, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0102, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0154, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0125, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0194, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0133, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0146, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0110, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0250, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0124, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0143, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0187, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0221, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0100, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0104, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0166, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0188, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0191, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0263, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0102, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0167, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0117, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0103, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0188, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0212, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0168, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0129, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0188, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0255, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0223, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0182, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0123, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0168, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0198, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0171, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0146, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0105, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0131, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0240, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0220, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0207, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0239, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0124, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0086, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0133, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0176, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0144, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0169, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0085, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0192, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0194, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0175, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0207, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0237, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0114, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0095, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0110, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0115, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0168, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0169, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0152, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0170, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0104, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0241, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0196, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0096, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0160, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0084, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0232, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0193, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0230, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0134, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0221, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0128, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0121, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0188, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0126, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0320, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0223, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0264, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0185, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0161, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0138, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0151, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0223, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0135, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0098, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0157, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0204, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0217, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0174, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0109, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0202, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0114, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0135, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0102, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0212, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0171, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0262, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0179, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0258, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0138, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0151, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0115, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0115, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0205, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0167, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0122, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0251, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0087, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0181, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0195, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0197, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0308, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0127, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0160, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0161, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0213, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0134, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0222, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0097, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0148, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0221, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0089, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0113, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0175, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0196, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0120, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0113, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0104, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0115, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0179, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0176, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0155, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0200, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0333, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0147, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0159, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0107, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0191, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0166, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0195, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0160, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0212, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0123, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0098, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0174, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0204, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0164, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0108, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0168, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0099, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0177, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0231, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0130, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0107, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0195, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0158, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0314, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0155, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0271, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0168, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0185, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0138, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0165, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0344, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0139, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0105, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0125, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0118, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0223, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0223, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0130, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0187, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0120, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0145, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0117, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0123, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0177, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0207, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0096, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0261, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0185, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0129, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0213, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0165, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0095, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0135, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0134, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0171, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0209, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0178, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0150, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0163, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0152, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0151, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0198, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0165, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0188, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0105, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0081, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0215, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0202, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0319, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0155, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0133, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0182, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0109, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0199, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0146, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0177, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0130, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0110, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0241, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0142, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0145, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0184, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0219, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0152, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0149, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0110, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0123, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0200, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0164, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0107, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0205, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0216, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0203, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0142, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0159, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0210, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0168, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0125, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0210, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0186, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0142, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0133, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0206, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0222, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0149, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0127, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0105, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0137, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0179, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0126, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0213, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0213, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0181, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0210, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0129, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0160, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0127, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0206, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0206, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0250, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0096, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0229, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0123, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0158, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0197, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0155, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0087, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0177, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0243, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0153, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0151, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0163, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0146, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0084, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0135, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0240, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0141, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0107, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0210, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0169, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0270, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0152, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0165, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0116, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0106, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0094, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0215, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0236, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0229, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0144, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0132, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0161, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0193, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0164, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0249, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0140, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0110, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0202, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0279, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0229, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0190, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0173, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0252, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0091, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0142, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0206, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0118, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0155, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0117, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0121, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0096, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0135, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0161, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0246, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0228, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0150, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0124, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0167, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0106, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0105, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0154, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0161, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0177, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0209, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0201, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0141, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0164, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0138, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0150, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0239, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0087, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0161, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0142, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0166, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0128, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0150, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0086, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0124, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0183, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0160, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0186, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0139, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0106, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0169, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0162, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0152, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0193, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0106, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0184, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0227, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0193, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0226, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0175, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0295, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0187, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0081, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0153, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0218, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0212, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0168, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0144, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0109, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0109, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0180, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0154, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0157, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0118, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0106, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0145, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0263, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0145, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0120, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0197, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0235, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0243, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0101, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0193, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0254, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0195, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0183, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0135, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0082, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0150, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0138, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0118, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0235, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0174, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0112, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0193, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0180, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0096, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0196, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0131, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0204, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0115, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0216, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0274, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0096, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0101, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0319, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0193, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0149, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0184, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0164, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0139, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0237, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0162, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0106, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0129, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0140, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0165, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0137, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0188, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0096, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0203, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0176, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0230, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0175, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0214, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0157, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0203, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0149, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0182, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0132, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0112, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0134, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0207, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0086, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0199, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0138, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0081, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0213, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0216, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0186, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0116, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0157, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0167, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0106, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0171, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0096, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0124, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0148, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0119, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0202, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0240, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0124, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0099, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0116, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0206, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0087, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0124, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0227, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0169, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0102, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0122, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0156, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0152, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0276, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0171, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0139, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0117, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0183, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0169, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0143, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0164, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0179, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0109, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0107, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0137, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0148, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0229, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0173, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0220, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0196, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0113, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0208, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0162, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0245, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0221, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0092, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0196, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0109, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0218, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0166, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0218, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0186, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0169, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0161, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0194, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0247, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0135, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0098, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0178, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0124, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0126, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0181, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0139, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0170, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0162, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0128, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0108, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0142, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0087, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0219, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0132, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0128, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0159, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0137, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0133, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0190, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0203, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0124, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0224, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0124, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0163, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0182, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0203, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0107, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0142, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0272, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0258, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0139, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0216, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0212, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0121, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0131, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0153, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0149, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0143, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0147, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0124, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0221, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0125, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0208, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0101, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0107, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0198, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0123, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0081, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0183, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0159, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0181, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0114, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0134, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0179, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0273, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0145, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0116, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0164, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0181, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0105, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0225, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0172, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0176, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0253, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0170, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0189, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0208, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0091, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0199, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0112, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0288, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0230, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0287, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0184, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0152, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0128, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0175, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0276, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0232, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0143, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0169, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0119, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0135, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0193, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0182, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0231, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0192, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0136, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0173, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0199, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0118, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0178, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0157, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0213, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0148, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0238, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0127, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0223, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0182, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0116, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0212, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0161, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0230, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0145, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0188, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0096, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0117, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0212, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0163, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0193, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0219, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0130, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0240, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0171, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0192, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0262, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0145, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0190, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0116, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0141, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0101, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0187, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0206, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0217, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0220, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0222, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0090, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0174, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0213, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0178, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0126, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0156, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0140, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0159, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0213, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0198, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0243, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0219, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0203, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0171, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0196, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0168, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0166, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0212, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0161, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0124, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0140, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0235, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0176, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0232, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0296, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0145, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0163, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0297, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0141, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0195, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0136, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0193, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0130, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0135, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0121, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0124, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0160, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0160, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0158, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0234, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0149, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0204, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0147, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0202, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0187, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0174, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0102, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0198, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0108, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0161, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0195, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0176, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0185, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0123, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0179, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0191, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0215, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0151, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0195, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0149, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0133, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0183, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0121, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0118, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0147, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0168, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0149, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0165, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0126, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0110, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0165, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0124, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0157, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0147, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0118, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0083, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0206, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0164, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0203, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0199, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0266, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0213, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0147, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0184, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0142, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0115, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0147, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0209, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0090, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0263, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0110, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0158, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0119, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0178, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0176, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0113, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0131, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0109, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0152, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0148, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0083, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0148, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0097, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0239, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0136, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0192, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0143, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0159, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0105, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0149, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0104, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0124, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0180, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0135, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0232, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0167, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0191, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0184, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0168, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0197, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0135, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0210, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0233, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0229, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0309, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0150, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0116, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0187, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0218, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0200, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0155, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0133, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0146, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0135, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0142, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0176, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0177, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0187, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0140, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0196, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0161, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0284, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0256, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0186, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0197, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0168, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0121, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0144, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0178, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0118, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0188, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0256, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0195, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0177, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0209, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0167, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0194, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0144, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0180, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0118, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0158, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0158, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0111, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0124, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0230, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0188, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0158, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0138, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0158, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0082, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0165, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0165, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0151, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0156, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0122, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0267, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0146, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0145, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0156, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0171, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0219, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0235, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0094, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0118, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0185, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0141, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0189, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0112, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0122, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0173, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0168, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0134, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0185, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0248, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0101, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0138, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0243, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0163, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0314, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0113, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0191, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0136, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0107, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0130, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0227, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0110, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0093, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0148, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0144, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0139, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0199, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0150, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0153, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0098, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0203, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0132, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0198, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0130, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0240, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0148, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0268, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0276, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0164, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0134, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0122, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0224, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0194, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0095, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0092, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0169, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0096, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0149, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0122, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0153, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0186, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0238, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0163, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0143, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0226, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0238, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0204, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0161, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0108, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0174, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0267, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0115, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0166, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0113, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0184, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0101, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0209, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0283, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0110, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0168, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0148, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0134, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0165, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0246, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0176, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0167, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0137, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0160, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0116, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0089, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0154, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0192, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0269, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0140, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0267, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0158, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0113, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0161, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0194, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0274, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0128, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0140, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0111, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0171, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0129, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0108, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0196, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0199, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0129, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0201, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0131, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0185, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0107, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0163, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0127, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0093, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0182, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0119, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0176, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0112, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0091, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0189, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0159, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0163, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0150, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0201, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0197, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0131, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0162, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0181, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0120, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0168, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0191, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0155, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0169, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0162, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0162, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0149, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0254, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0089, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0160, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0087, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0174, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0094, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0170, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0229, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0164, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0151, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0162, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0201, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0243, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0171, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0136, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0134, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0217, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0147, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0109, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0166, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0217, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0149, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0193, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0185, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0096, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0119, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0194, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0204, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0142, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0129, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0212, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0159, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0181, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0162, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0239, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0209, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0146, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0272, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0166, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0165, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0214, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0116, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0178, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0117, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0121, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0103, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0147, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0168, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0147, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0145, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0142, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0254, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0176, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0150, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0090, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0221, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0112, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0186, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0232, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0097, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0137, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0179, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0092, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0182, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0120, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0087, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0140, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0174, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0181, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0148, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0164, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0146, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0123, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0228, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0128, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0135, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0098, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0276, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0125, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0176, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0120, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0204, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0172, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0093, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0141, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0118, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0117, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0190, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0089, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0164, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0104, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0219, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0149, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0088, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0123, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0188, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0191, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0136, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0110, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0119, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0192, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0180, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0159, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0144, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0132, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0255, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0150, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0100, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0157, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0122, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0156, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0163, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0089, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0181, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0164, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0087, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0099, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0123, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0258, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0124, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0214, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0132, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0096, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0114, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0125, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0148, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0147, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0237, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0190, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0173, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0084, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0082, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0137, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0148, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0236, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0121, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0174, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0096, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0205, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0126, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0202, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0177, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0167, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0085, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0240, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0157, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0189, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0098, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0133, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0168, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0101, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0125, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0121, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0127, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0161, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0119, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0249, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0164, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0151, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0191, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0128, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0189, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0126, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0102, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0180, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0110, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0161, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0139, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0087, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0147, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0120, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0189, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0187, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0183, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0129, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0146, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0163, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0183, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0208, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0164, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0102, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0139, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0084, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0107, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0171, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0216, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0138, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0149, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0174, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0081, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0106, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0107, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0105, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0127, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0142, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0121, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0213, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0163, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0162, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0134, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0140, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0178, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0300, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0196, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0117, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0178, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0304, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0101, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0182, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0129, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0183, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0151, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0094, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0117, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0143, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0118, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0226, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0201, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0163, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0172, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0242, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0175, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0200, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0084, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0115, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0163, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0251, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0150, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0082, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0161, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0122, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0124, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0113, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0206, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0143, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0083, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0119, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0145, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0176, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0169, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0216, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0128, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0226, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0212, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0145, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0173, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0187, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0175, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0284, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0162, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0191, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0219, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0166, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0128, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0161, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0203, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0197, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0151, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0132, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0193, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0093, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0165, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0211, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0161, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0182, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0133, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0122, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0095, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0206, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0098, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0190, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0227, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0192, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0090, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0183, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0143, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0132, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0105, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0153, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0102, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0100, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0091, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0096, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0192, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0113, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0187, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0156, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0182, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0188, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0167, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0080, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0214, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0129, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0168, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0165, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0157, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0149, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0145, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0211, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0202, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0156, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0153, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0193, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0105, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0193, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0085, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0159, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0168, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0170, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0146, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0115, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0144, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0171, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0204, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0084, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0106, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0158, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0095, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0126, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0219, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0162, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0118, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0097, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0133, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0162, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0093, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0183, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0101, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0169, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0163, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0187, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0096, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0193, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0126, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0114, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0152, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0092, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0090, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0161, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0154, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0112, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0198, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0162, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0082, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0212, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0174, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0123, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0082, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0180, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0129, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0198, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0162, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0195, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0256, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0204, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0381, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0114, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0080, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0143, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0090, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0097, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0146, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0144, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0126, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0197, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0171, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0192, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0124, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0171, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0098, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0123, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0152, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0151, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0127, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0080, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0096, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0170, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0160, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0206, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0198, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0091, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0199, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0155, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0115, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0139, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0160, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0125, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0253, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0091, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0317, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0114, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0100, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0275, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0145, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0143, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0191, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0148, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0150, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0252, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0148, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0168, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0179, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0124, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0103, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0137, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0119, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0152, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0135, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0174, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0245, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0157, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0150, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0146, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0105, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0105, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0172, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0143, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0313, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0200, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0200, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0116, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0090, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0102, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0161, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0158, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0194, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0144, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0208, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0183, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0117, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0207, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0191, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0251, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0143, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0086, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0155, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0158, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0315, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0222, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0148, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0099, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0172, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0084, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0103, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0154, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0198, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0180, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0107, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0086, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0198, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0194, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0239, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0096, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0184, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0213, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0151, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0202, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0121, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0113, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0145, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0092, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0158, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0200, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0247, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0143, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0321, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0244, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0246, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0099, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0137, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0175, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0319, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0082, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0155, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0110, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0093, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0142, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0162, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0103, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0113, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0158, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0105, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0116, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0115, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0220, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0160, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0091, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0084, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0091, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0155, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0133, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0136, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0164, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0104, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0192, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0236, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0185, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0205, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0175, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0159, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0147, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0185, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0128, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0176, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0132, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0144, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0168, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0197, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0099, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0205, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0214, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0201, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0123, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0175, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0192, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0177, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0140, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0159, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0153, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0164, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0103, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0159, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0192, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0170, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0169, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0092, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0205, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0148, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0245, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0147, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0142, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0128, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0245, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0198, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0193, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0246, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0206, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0081, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0098, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0112, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0102, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0145, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0119, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0127, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0162, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0167, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0098, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0121, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0141, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0101, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0154, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0181, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0200, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0259, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0207, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0185, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0104, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0134, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0212, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0129, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0136, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0354, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0117, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0100, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0128, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0149, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0111, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0224, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0246, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0168, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0149, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0104, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0178, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0108, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0177, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0191, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0248, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0149, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0093, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0093, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0125, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0179, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0166, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0123, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0162, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0145, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0089, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0176, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0148, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0263, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0131, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0264, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0108, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0190, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0097, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0146, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0250, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0124, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0136, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0154, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0129, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0196, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0133, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0102, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0236, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0279, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0089, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0153, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0105, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0138, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0173, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0148, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0138, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0088, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0240, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0118, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0122, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0199, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0170, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0198, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0121, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0124, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0200, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0165, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0181, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0150, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0115, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0251, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0160, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0135, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0165, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0158, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0101, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0146, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0116, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0157, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0229, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0298, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0188, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0197, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0142, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0179, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0178, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0186, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0104, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0320, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0191, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0134, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0113, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0180, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0137, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0124, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0253, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0112, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0357, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0095, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0158, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0130, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0122, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0172, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0217, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0120, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0143, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0103, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0179, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0174, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0129, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0122, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0232, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0201, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0115, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0129, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0090, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0163, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0136, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0168, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0140, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0119, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0106, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0140, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0097, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0099, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0123, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0141, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0167, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0183, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0112, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0163, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0123, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0116, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0107, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0205, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0131, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0191, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0175, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0101, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0113, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0153, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0198, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0163, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0171, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0101, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0184, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0172, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0111, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0171, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0155, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0109, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0137, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0144, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0211, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0131, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0125, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0177, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0175, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0148, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0162, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0168, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0098, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0149, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0247, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0151, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0123, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0136, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0146, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0181, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0159, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0171, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0138, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0217, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0114, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0204, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0171, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0173, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0153, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0115, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0179, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0195, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0110, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0199, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0163, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0137, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0134, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0132, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0093, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0147, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0189, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0108, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0123, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0125, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0219, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0158, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0121, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0164, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0150, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0146, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0222, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0126, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0148, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0159, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0168, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0106, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0235, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0153, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0109, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0181, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0180, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0224, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0184, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0101, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0123, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0131, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0118, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0193, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0136, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0129, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0142, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0140, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0184, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0139, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0093, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0350, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0175, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0246, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0097, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0239, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0148, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0191, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0340, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0158, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0121, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0253, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0242, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0172, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0189, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0176, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0205, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0103, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0126, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0127, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0114, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0111, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0228, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0098, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0150, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0143, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0268, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0203, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0128, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0143, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0130, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0242, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0143, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0157, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0224, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0196, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0096, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0184, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0165, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0172, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0124, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0156, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0205, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0114, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0150, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0196, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0183, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0195, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0097, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0175, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0275, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0159, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0120, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0136, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0140, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0191, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0093, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0083, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0102, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0128, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0166, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0107, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0106, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0158, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0095, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0119, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0231, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0113, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0088, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0116, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0209, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0168, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0221, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0086, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0160, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0103, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0136, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0151, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0196, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0175, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0118, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0178, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0150, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0140, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0148, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0211, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0096, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0172, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0106, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0128, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0186, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0139, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0239, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0149, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0147, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0203, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0182, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0161, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0086, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0221, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0179, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0146, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0170, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0127, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0099, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0141, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0119, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0119, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0174, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0153, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0357, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0125, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0133, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0145, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0247, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0102, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0101, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0147, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0202, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0139, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0287, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0218, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0138, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0157, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0148, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0199, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0131, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0174, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0204, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0179, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0108, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0112, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0302, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0187, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0155, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0133, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0175, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0126, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0186, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0173, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0152, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0192, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0213, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0178, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0227, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0178, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0193, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0182, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0126, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0197, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0121, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0210, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0173, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0232, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0177, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0082, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0118, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0154, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0102, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0086, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0136, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0141, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0181, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0095, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0130, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0186, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0251, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0125, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0210, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0189, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0100, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0133, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0119, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0199, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0159, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0234, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0232, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0093, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0120, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0268, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0203, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0168, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0175, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0144, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0153, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0133, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0111, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0140, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0192, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0183, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0103, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0149, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0219, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0237, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0094, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0214, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0144, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0190, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0200, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0124, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0250, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0259, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0118, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0198, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0217, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0175, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0208, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0241, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0190, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0169, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0080, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0107, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0257, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0161, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0184, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0166, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0136, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0137, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0212, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0092, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0124, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0148, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0105, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0230, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0257, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0163, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0236, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0161, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0122, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0159, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0193, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0174, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0113, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0182, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0129, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0235, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0097, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0185, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0198, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0184, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0125, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0161, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0192, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0122, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0106, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0108, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0168, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0238, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0170, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0153, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0130, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0352, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0195, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0187, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0325, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0154, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0092, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0294, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0085, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0126, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0125, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0165, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0149, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0161, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0128, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0152, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0108, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0191, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0117, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0101, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0186, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0282, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0228, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0208, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0156, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0116, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0199, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0147, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0117, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0149, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0124, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0163, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0113, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0135, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0112, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0191, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0116, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0228, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0135, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0094, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0190, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0145, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0183, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0124, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0191, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0146, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0250, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0187, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0143, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0156, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0261, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0133, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0138, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0157, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0082, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0098, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0176, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0162, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0192, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0240, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0113, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0211, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0160, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0174, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0174, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0137, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0091, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0320, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0100, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0127, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0162, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0181, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0154, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0112, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0172, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0182, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0177, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0192, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0082, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0089, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0125, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0184, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0198, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0224, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0142, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0107, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0121, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0151, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0100, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0186, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0080, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0209, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0206, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0166, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0164, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0122, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0166, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0152, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0114, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0174, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0237, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0168, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0106, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0156, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0115, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0123, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0193, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0256, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0150, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0182, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0201, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0103, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0112, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0110, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0185, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0159, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0096, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0128, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0208, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0137, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0212, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0126, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0132, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0224, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0251, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0131, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0246, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0115, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0159, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0129, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0109, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0090, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0190, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0089, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0185, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0153, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0223, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0153, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0153, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0100, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0114, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0128, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0165, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0172, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0153, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0203, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0120, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0120, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0159, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0179, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0150, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0094, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0152, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0125, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0090, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0090, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0159, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0141, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0185, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0191, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0181, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0183, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0126, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0130, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0122, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0179, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0218, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0099, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0117, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0098, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0161, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0224, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0174, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0152, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0089, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0155, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0149, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0091, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0189, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0208, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0088, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0141, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0198, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0151, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0163, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0198, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0095, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0155, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0191, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0211, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0163, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0097, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0154, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0123, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0248, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0210, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0127, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0179, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0175, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0105, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0174, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0163, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0105, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0161, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0088, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0106, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0108, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0160, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0109, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0159, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0177, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0159, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0271, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0129, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0140, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0189, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0260, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0327, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0163, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0160, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0149, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0164, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0139, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0179, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0110, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0165, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0107, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0110, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0118, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0225, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0169, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0151, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0135, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0128, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0167, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0158, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0219, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0130, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0226, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0130, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0154, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0163, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0366, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0150, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0138, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0205, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0154, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0167, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0115, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0116, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0143, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0153, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0172, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0221, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0111, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0187, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0084, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0110, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0247, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0136, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0122, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0131, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0115, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0138, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0121, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0145, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0192, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0091, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0119, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0147, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0191, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0130, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0114, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0176, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0091, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0099, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0208, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0105, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0106, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0146, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0152, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0162, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0237, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0130, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0143, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0110, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0085, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0148, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0106, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0148, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0139, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0137, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0094, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0136, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0212, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0190, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0152, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0207, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0259, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0151, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0169, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0216, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0088, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0245, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0141, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0178, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0139, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0093, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0185, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0090, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0121, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0175, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0166, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0114, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0261, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0221, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0130, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0099, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0209, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0139, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0168, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0206, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0083, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0265, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0154, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0201, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0103, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0128, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0193, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0205, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0121, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0261, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0081, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0259, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0191, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0207, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0182, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0127, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0150, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0154, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0128, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0180, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0162, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0181, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0092, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0265, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0185, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0153, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0165, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0099, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0162, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0201, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0196, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0194, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0257, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0233, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0140, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0199, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0234, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0139, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0153, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0123, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0303, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0081, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0262, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0120, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0122, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0139, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0108, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0171, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0146, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0200, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0135, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0138, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0197, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0190, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0143, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0157, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0106, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0114, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0120, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0121, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0164, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0128, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0126, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0123, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0263, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0113, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0144, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0249, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0216, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0160, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0161, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0245, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0189, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0109, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0209, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0155, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0108, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0225, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0123, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0194, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0229, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0197, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0200, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0228, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0162, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0166, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0233, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0200, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0098, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0142, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0114, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0129, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0150, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0151, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0086, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0169, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0130, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0157, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0170, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0118, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0152, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0091, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0208, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0179, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0153, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0222, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0133, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0268, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0104, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0201, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0138, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0120, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0196, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0143, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0118, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0116, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0157, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0169, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0207, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0131, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0147, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0175, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0220, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0306, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0216, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0264, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0135, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0103, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0165, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0151, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0168, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0103, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0100, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0224, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0163, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0131, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0149, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0088, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0137, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0136, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0126, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0167, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0223, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0082, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0121, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0120, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0203, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0118, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0129, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0113, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0138, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0133, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0117, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0145, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0176, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0146, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0119, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0282, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0132, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0146, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0121, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0103, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0080, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0099, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0140, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0130, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0133, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0132, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0092, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0107, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0184, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0199, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0112, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0180, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0128, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0181, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0153, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0107, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0221, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0138, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0098, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0128, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0194, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0101, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0119, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0122, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0238, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0131, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0097, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0096, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0167, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0133, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0111, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0120, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0239, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0184, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0165, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0216, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0140, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0111, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0133, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0160, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0086, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0167, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0196, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0233, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0123, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0105, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0115, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0195, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0157, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0241, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0125, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0234, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0141, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0113, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0158, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0130, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0137, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0198, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0153, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0235, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0162, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0128, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0161, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0144, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0142, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0179, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0219, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0089, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0111, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0157, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0207, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0237, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0126, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0158, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0231, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0137, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0145, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0198, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0188, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0096, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0199, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0153, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0211, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0136, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0085, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0125, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0198, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0110, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0197, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0169, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0096, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0189, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0141, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0166, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0127, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0174, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0154, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0207, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0084, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0262, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0194, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0306, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0116, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0113, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0153, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0123, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0134, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0153, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0198, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0195, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0170, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0146, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0111, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0323, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0166, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0121, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0081, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0141, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0197, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0133, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0110, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0183, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0172, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0126, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0168, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0306, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0129, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0096, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0092, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0225, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0109, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0149, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0172, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0229, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0084, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0263, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0166, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0173, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0158, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0149, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0194, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0342, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0127, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0082, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0183, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0159, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0130, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0131, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0115, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0266, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0254, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0164, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0214, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0145, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0119, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0130, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0165, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0189, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0225, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0141, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0121, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0220, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0113, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0155, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0180, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0186, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0094, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0167, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0147, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0089, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0137, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0190, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0144, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0150, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0178, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0121, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0181, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0244, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0086, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0225, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0096, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0144, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0140, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0085, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0157, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0160, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0118, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0190, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0120, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0207, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0147, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0100, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0145, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0291, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0220, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0088, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0121, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0221, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0180, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0246, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0119, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0136, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0133, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0094, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0087, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0221, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0092, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0130, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0232, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0219, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0097, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0192, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0173, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0127, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0210, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0200, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0137, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0181, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0131, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0136, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0188, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0081, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0104, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0139, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0141, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0157, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0197, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0117, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0222, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0134, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0251, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0130, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0224, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0145, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0196, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0227, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0133, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0136, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0213, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0146, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0122, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0087, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0183, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0193, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0169, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0149, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0208, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0118, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0256, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0138, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0188, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0109, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0101, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0197, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0194, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0122, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0091, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0161, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0255, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0127, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0188, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0162, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0104, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0186, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0080, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0160, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0177, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0209, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0124, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0164, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0140, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0146, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0136, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0162, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0234, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0195, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0153, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0134, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0162, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0162, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0174, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0180, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0166, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0165, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0172, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0127, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0109, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0192, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0083, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0100, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0175, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0152, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0178, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0171, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0152, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0095, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0179, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0243, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0084, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0148, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0125, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0113, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0170, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0258, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0152, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0128, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0147, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0187, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0126, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0111, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0089, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0121, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0127, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0144, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0168, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0202, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0131, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0092, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0096, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0145, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0138, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0145, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0115, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0150, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0126, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0192, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0322, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0123, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0145, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0172, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0283, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0145, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0190, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0164, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0146, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0084, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0234, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0207, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0123, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0122, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0116, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0204, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0215, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0229, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0190, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0178, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0139, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0106, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0158, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0153, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0211, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0138, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0170, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0080, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0150, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0159, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0144, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0171, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0257, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0108, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0207, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0168, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0203, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0240, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0170, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0251, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0156, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0176, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0093, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0150, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0128, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0120, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0157, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0192, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0171, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0206, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0126, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0153, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0095, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0136, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0156, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0155, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0163, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0132, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0238, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0178, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0171, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0115, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0098, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0122, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0080, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0166, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0207, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0157, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0161, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0143, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0189, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0219, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0168, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0120, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0165, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0221, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0198, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0155, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0156, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0160, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0109, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0227, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0131, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0226, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0201, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0109, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0089, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0198, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0128, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0164, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0281, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0215, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0227, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0089, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0186, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0174, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0127, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0141, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0145, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0092, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0182, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0104, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0197, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0245, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0156, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0187, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0106, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0146, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0173, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0187, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0174, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0173, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0187, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0121, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0095, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0131, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0214, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0102, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0189, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0194, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0158, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0126, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0124, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0085, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0111, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0120, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0126, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0135, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0196, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0160, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0094, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0142, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0184, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0139, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0091, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0113, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0259, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0234, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0181, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0161, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0151, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0146, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0129, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0117, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0173, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0115, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0129, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0183, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0260, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0199, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0119, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0092, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0161, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0144, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0171, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0150, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0185, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0225, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0181, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0169, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0098, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0097, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0104, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0100, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0133, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0188, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0166, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0204, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0171, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0142, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0114, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0111, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0151, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0090, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0172, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0101, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0170, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0133, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0126, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0211, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0096, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0140, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0131, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0239, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0295, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0216, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0174, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0126, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0127, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0106, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0214, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0134, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0117, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0129, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0210, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0118, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0101, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0194, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0099, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0095, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0103, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0192, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0125, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0167, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0174, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0259, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0103, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0106, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0142, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0127, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0235, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0117, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0148, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0125, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0168, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0133, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0170, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0158, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0153, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0106, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0100, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0155, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0118, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0103, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0118, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0166, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0167, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0154, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0139, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0151, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0115, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0181, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0140, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0134, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0219, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0132, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0085, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0183, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0153, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0171, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0118, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0195, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0191, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0195, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0220, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0235, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0212, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0215, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0157, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0150, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0175, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0154, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0118, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0135, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0097, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0141, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0088, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0156, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0221, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0092, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0142, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0169, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0178, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0206, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0208, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0189, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0179, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0286, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0106, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0088, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0256, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0134, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0115, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0082, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0140, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0181, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0232, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0153, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0119, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0155, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0151, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0142, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0104, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0110, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0224, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0166, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0131, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0156, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0226, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0112, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0192, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0127, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0118, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0161, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0128, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0151, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0165, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0132, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0098, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0121, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0094, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0154, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0159, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0135, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0177, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0149, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0138, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0125, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0179, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0130, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0087, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0144, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0322, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0184, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0092, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0224, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0124, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0156, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0122, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0148, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0126, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0136, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0115, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0185, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0087, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0109, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0151, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0106, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0147, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0153, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0268, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0105, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0154, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0119, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0119, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0148, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0101, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0198, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0121, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0140, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0104, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0098, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0105, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0252, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0142, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0119, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0138, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0172, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0121, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0126, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0233, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0177, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0242, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0207, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0228, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0161, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0151, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0143, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0134, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0157, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0156, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0208, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0209, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0163, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0165, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0211, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0217, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0090, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0103, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0174, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0145, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0141, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0095, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0135, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0203, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0193, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0241, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0137, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0086, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0092, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0104, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0107, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0172, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0169, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0133, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0127, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0095, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0138, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0151, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0133, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0123, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0210, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0172, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0214, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0139, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0165, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0159, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0265, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0167, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0322, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0153, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0176, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0162, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0124, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0201, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0171, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0237, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0207, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0099, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0130, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0191, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0117, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0188, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0160, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0140, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0187, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0141, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0131, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0132, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0134, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0110, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0121, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0133, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0202, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0246, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0218, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0190, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0204, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0164, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0128, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0206, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0222, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0170, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0094, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0245, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0319, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0286, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0145, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0111, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0182, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0127, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0100, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0094, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0156, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0163, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0137, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0129, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0139, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0104, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0087, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0128, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0139, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0144, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0156, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0148, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0194, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0105, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0124, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0193, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0227, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0196, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0126, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0188, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0116, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0137, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0153, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0140, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0185, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0185, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0099, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0227, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0271, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0137, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0177, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0125, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0196, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0094, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0259, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0134, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0136, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0206, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0163, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0110, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0141, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0081, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0138, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0212, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0148, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0161, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0222, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0169, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0208, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0155, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0134, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0222, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0215, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0289, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0133, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0119, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0197, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0190, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0166, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0087, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0101, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0151, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0113, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0171, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0095, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0127, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0171, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0110, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0270, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0088, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0141, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0124, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0132, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0129, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0120, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0185, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0184, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0131, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0178, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0128, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0170, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0200, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0175, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0186, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0191, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0156, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0186, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0127, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0245, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0191, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0125, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0113, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0172, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0175, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0179, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0135, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0174, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0152, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0147, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0191, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0162, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0174, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0114, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0129, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0161, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0102, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0119, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0098, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0222, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0144, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0125, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0105, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0126, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0166, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0096, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0136, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0119, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0203, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0143, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0135, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0205, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0135, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0104, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0221, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0145, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0210, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0221, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0163, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0145, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0134, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0090, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0107, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0148, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0161, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0160, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0086, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0202, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0191, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0141, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0274, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0197, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0259, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0140, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0127, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0128, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0165, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0114, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0176, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0166, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0188, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0127, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0090, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0181, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0109, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0149, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0169, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0199, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0081, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0100, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0081, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0326, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0277, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0173, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0154, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0119, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0157, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0155, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0280, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0134, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0095, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0215, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0171, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0175, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0180, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0181, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0145, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0088, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0191, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0113, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0102, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0142, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0141, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0193, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0288, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0090, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0206, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0174, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0161, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0143, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0155, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0135, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0138, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0115, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0117, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0256, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0217, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0187, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0154, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0194, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0198, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0185, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0152, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0124, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0147, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0136, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0206, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0115, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0197, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0113, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0096, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0101, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0111, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0107, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0168, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0107, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0136, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0206, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0186, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0178, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0205, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0203, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0154, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0127, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0158, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0154, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0142, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0130, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0169, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0103, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0136, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0200, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0169, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0099, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0124, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0154, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0119, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0143, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0142, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0143, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0121, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0234, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0133, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0204, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0120, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0116, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0218, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0220, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0101, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0167, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0096, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0095, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0205, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0130, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0144, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0134, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0217, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0166, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0198, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0213, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0110, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0141, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0140, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0117, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0176, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0137, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0088, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0149, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0191, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0080, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0131, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0155, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0164, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0205, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0087, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0126, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0113, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0192, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0120, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0101, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0193, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0194, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0116, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0260, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0159, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0109, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0237, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0154, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0145, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0166, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0162, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0104, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0102, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0147, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0183, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0144, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0171, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0084, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0179, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0084, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0115, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0089, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0158, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0109, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0117, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0143, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0173, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0121, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0137, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0154, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0270, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0117, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0175, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0094, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0210, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0119, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0161, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0138, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0168, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0195, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0082, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0145, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0218, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0222, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0113, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0094, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0210, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0179, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0121, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0159, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0247, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0116, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0147, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0187, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0164, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0114, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0136, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0167, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0182, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0152, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0181, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0112, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0085, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0118, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0107, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0134, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0124, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0194, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0114, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0110, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0142, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0110, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0093, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0140, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0179, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0144, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0097, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0162, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0138, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0111, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0149, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0257, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0243, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0151, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0172, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0154, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0171, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0163, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0088, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0089, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0171, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0091, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0119, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0184, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0170, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0224, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0184, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0104, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0176, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0141, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0237, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0157, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0136, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0120, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0226, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0132, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0152, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0112, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0189, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0176, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0224, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0209, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0126, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0196, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0179, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0215, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0115, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0115, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0118, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0112, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0155, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0218, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0201, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0154, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0109, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0191, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0134, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0129, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0208, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0217, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0138, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0167, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0295, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0159, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0152, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0095, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0162, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0111, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0090, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0177, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0098, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0124, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0192, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0092, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0220, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0147, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0188, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0154, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0216, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0124, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0150, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0166, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0104, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0141, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0144, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0150, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0168, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0093, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0117, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0183, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0196, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0187, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0148, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0095, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0145, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0201, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0128, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0216, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0102, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0146, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0209, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0152, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0161, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0099, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0122, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0133, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0151, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0116, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0209, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0196, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0205, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0230, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0086, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0133, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0156, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0237, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0227, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0102, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0194, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0125, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0089, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0134, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0132, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0102, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0200, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0108, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0141, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0098, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0108, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0143, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0186, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0098, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0134, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0195, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0104, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0145, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0138, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0226, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0170, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0174, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0126, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0116, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0135, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0238, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0143, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0232, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0259, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0141, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0088, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0192, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0102, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0158, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0109, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0157, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0143, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0144, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0188, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0212, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0151, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0140, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0167, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0188, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0143, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0197, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0225, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0146, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0110, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0111, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0147, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0154, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0138, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0156, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0228, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0188, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0123, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0119, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0111, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0091, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0118, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0195, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0137, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0151, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0149, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0167, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0088, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0206, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0133, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0089, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0192, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0201, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0197, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0166, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0118, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0175, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0181, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0255, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0186, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0110, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0238, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0149, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0203, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0119, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0109, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0147, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0137, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0101, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0155, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0269, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0163, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0161, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0086, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0084, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0150, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0104, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0138, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0174, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0119, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0114, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0161, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0173, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0145, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0128, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0168, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0199, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0125, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0253, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0092, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0131, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0158, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0178, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0180, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0161, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0141, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0118, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0155, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0164, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0104, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0203, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0114, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0197, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0193, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0137, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0199, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0122, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0095, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0124, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0136, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0132, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0110, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0154, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0123, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0177, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0217, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0145, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0251, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0139, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0107, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0161, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0175, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0192, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0122, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0163, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0173, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0093, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0166, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0116, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0266, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0149, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0092, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0155, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0175, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0136, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0151, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0104, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0161, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0184, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0260, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0204, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0149, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0169, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0274, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0154, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0150, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0244, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0137, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0177, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0205, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0269, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0118, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0186, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0126, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0199, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0125, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0116, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0163, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0081, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0181, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0133, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0145, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0126, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0116, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0106, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0114, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0171, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0153, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0154, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0150, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0171, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0147, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0146, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0136, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0097, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0168, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0179, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0130, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0186, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0184, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0168, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0134, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0117, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0177, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0097, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0134, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0124, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0122, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0197, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0123, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0134, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0246, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0125, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0238, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0108, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0129, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0085, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0147, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0127, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0146, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0112, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0139, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0108, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0217, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0113, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0116, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0235, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0080, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0105, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0310, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0125, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0092, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0165, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0160, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0116, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0120, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0152, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0155, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0089, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0100, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0132, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0112, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0148, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0138, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0140, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0193, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0130, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0104, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0095, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0214, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0081, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0185, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0151, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0187, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0093, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0150, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0140, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0171, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0161, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0175, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0086, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0121, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0144, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0148, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0210, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0174, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0113, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0121, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0270, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0184, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0134, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0158, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0203, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0196, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0124, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0081, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0084, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0243, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0126, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0150, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0240, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0208, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0099, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0172, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0112, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0156, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0186, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0149, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0163, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0179, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0118, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0199, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0133, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0158, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0148, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0153, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0161, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0092, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0226, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0105, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0133, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0158, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0227, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0146, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0118, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0119, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0290, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0168, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0121, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0240, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0155, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0158, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0125, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0148, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0120, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0113, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0139, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0116, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0203, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0300, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0215, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0202, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0188, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0129, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0233, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0181, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0111, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0143, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0144, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0088, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0124, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0164, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0265, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0153, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0144, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0169, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0167, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0135, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0168, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0123, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0127, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0162, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0192, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0088, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0187, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0263, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0203, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0179, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0273, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0160, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0187, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0225, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0081, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0169, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0169, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0097, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0127, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0118, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0200, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0167, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0106, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0187, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0183, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0100, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0115, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0252, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0124, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0152, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0097, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0158, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0157, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0150, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0147, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0132, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0155, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0171, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0223, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0138, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0154, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0134, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0159, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0092, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0142, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0151, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0082, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0142, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0146, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0121, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0106, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0209, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0152, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0233, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0193, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0129, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0200, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0095, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0162, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0138, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0145, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0169, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0131, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0185, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0147, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0171, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0127, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0118, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0221, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0162, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0151, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0100, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0099, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0131, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0121, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0195, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0174, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0138, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0166, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0164, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0140, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0130, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0178, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0169, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0097, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0179, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0092, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0087, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0144, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0131, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0286, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0200, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0114, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0142, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0149, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0168, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0252, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0080, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0198, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0222, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0188, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0266, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0167, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0082, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0108, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0123, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0179, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0098, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0127, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0184, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0198, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0191, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0081, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0112, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0202, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0123, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0081, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0160, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0096, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0156, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0085, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0110, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0114, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0125, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0147, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0201, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0088, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0231, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0138, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0107, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0109, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0123, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0128, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0153, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0112, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0194, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0154, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0163, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0120, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0106, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0191, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0257, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0080, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0192, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0091, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0108, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0101, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0195, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0176, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0192, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0191, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0193, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0269, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0138, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0109, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0243, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0095, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0139, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0115, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0171, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0126, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0111, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0125, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0152, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0114, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0174, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0189, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0172, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0183, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0249, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0140, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0132, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0142, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0136, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0101, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0125, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0281, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0145, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0120, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0225, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0131, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0250, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0115, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0136, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0186, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0096, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0111, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0142, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0150, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0081, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0211, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0180, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0234, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0183, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0110, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0117, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0099, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0088, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0143, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0185, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0100, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0115, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0159, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0094, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0157, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0151, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0108, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0146, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0160, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0120, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0148, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0186, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0129, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0112, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0150, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0186, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0176, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0119, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0157, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0142, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0147, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0173, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0125, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0232, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0116, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0186, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0175, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0200, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0158, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0146, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0145, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0187, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0084, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0195, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0171, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0165, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0198, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0132, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0142, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0349, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0084, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0143, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0181, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0170, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0172, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0223, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0116, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0084, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0175, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0162, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0190, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0175, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0104, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0165, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0166, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0208, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0115, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0156, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0212, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0131, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0144, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0169, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0113, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0116, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0095, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0170, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0195, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0222, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0117, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0195, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0195, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0176, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0138, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0129, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0152, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0129, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0210, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0164, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0157, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0117, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0193, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0126, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0182, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0137, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0161, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0168, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0116, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0089, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0175, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0139, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0287, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0164, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0225, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0093, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0216, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0090, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0171, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0177, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0144, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0199, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0202, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0208, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0186, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0098, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0102, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0270, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0122, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0131, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0111, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0136, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0194, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0111, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0227, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0137, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0130, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0097, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0145, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0170, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0125, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0213, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0109, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0113, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0122, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0137, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0213, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0084, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0152, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0260, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0179, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0080, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0228, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0120, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0244, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0208, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0106, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0133, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0182, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0217, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0190, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0100, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0118, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0154, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0092, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0172, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0190, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0121, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0195, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0193, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0132, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0150, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0192, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0165, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0108, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0182, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0109, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0083, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0261, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0101, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0166, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0155, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0149, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0315, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0247, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0240, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0113, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0165, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0086, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0179, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0171, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0140, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0263, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0117, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0219, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0203, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0180, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0115, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0163, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0204, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0195, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0118, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0130, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0176, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0166, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0094, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0173, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0142, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0159, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0116, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0092, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0139, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0174, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0097, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0153, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0101, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0112, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0108, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0171, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0165, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0173, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0146, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0152, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0115, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0224, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0282, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0165, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0100, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0095, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0201, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0127, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0140, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0188, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0243, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0127, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0103, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0155, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0161, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0103, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0140, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0134, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0190, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0182, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0103, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0202, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0117, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0159, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0110, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0156, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0228, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0194, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0128, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0124, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0098, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0134, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0141, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0195, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0177, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0121, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0250, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0146, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0172, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0151, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0160, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0139, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0146, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0203, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0184, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0181, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0104, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0119, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0166, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0120, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0113, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0114, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0115, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0121, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0123, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0153, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0176, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0108, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0329, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0159, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0223, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0126, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0150, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0147, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0267, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0184, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0133, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0095, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0105, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0184, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0140, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0154, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0281, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0108, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0127, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0197, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0219, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0202, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0157, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0194, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0173, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0109, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0261, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0209, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0152, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0218, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0204, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0144, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0143, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0186, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0218, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0170, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0233, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0166, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0098, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0135, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0103, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0134, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0151, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0085, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0192, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0161, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0184, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0220, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0178, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0123, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0095, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0101, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0131, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0139, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0102, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0167, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0118, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0133, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0240, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0160, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0177, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0194, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0246, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0192, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0159, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0248, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0092, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0097, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0131, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0224, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0168, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0108, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0142, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0090, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0083, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0189, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0168, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0164, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0138, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0106, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0107, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0137, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0133, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0148, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0119, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0176, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0225, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0130, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0211, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0136, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0201, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0130, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0100, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0208, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0104, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0144, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0114, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0188, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0121, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0130, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0147, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0279, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0168, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0145, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0192, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0170, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0191, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0128, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0081, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0156, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0081, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0126, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0145, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0185, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0155, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0155, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0171, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0097, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0127, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0132, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0154, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0169, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0099, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0138, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0161, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0188, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0109, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0143, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0105, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0104, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0086, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0096, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0222, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0131, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0194, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0284, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0123, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0150, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0097, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0146, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0206, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0132, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0099, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0308, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0159, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0205, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0101, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0252, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0163, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0176, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0128, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0210, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0169, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0181, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0214, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0136, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0235, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0155, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0090, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0130, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0140, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0190, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0141, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0144, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0151, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0097, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0085, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0172, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0120, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0121, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0197, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0121, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0160, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0134, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0141, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0089, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0149, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0221, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0202, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0126, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0151, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0150, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0120, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0099, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0154, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0127, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0129, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0111, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0154, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0090, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0112, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0135, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0159, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0157, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0106, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0089, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0095, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0108, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0090, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0233, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0176, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0137, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0103, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0170, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0257, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0230, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0112, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0136, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0105, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0146, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0110, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0122, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0121, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0157, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0163, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0209, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0170, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0136, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0175, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0102, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0108, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0140, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0118, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0105, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0152, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0116, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0121, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0162, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0136, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0140, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0097, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0183, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0175, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0139, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0117, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0202, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0105, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0316, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0100, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0092, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0109, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0091, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0152, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0142, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0173, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0238, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0151, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0111, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0308, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0181, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0087, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0106, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0148, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0105, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0107, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0188, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0129, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0117, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0188, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0136, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0152, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0167, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0239, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0202, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0080, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0207, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0119, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0172, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0101, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0150, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0126, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0129, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0140, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0100, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0183, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0093, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0100, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0086, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0148, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0129, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0186, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0093, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0240, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0251, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0159, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0129, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0115, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0094, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0115, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0144, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0131, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0145, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0229, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0088, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0168, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0163, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0188, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0228, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0090, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0110, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0121, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0192, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0158, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0124, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0095, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0196, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0155, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0115, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0210, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0201, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0132, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0147, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0165, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0159, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0090, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0128, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0149, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0173, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0090, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0135, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0217, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0146, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0086, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0084, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0080, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0118, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0193, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0094, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0108, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0168, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0176, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0115, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0116, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0097, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0091, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0103, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0154, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0105, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0096, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0130, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0196, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0143, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0114, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0148, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0133, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0173, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0183, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0264, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0125, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0159, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0193, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0118, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0119, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0129, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0095, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0119, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0081, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0160, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0112, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0097, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0140, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0088, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0189, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0125, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0186, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0139, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0165, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0199, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0182, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0166, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0133, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0109, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0148, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0205, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0290, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0150, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0179, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0154, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0159, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0131, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0130, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0129, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0094, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0094, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0106, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0142, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0117, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0268, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0120, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0169, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0129, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0266, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0136, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0188, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0102, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0215, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0188, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0183, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0150, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0191, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0189, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0141, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0193, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0091, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0134, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0194, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0127, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0122, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0152, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0109, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0201, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0160, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0104, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0117, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0135, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0130, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0139, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0205, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0176, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0194, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0175, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0155, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0100, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0176, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0197, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0131, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0110, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0104, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0175, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0094, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0106, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0153, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0144, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0105, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0195, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0143, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0090, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0157, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0179, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0098, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0111, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0203, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0173, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0173, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0203, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0264, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0119, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0088, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0102, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0133, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0160, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0223, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0120, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0142, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0182, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0271, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0117, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0192, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0165, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0107, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0150, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0161, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0195, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0133, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0158, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0094, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0129, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0135, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0126, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0135, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0219, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0101, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0157, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0202, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0202, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0120, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0113, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0114, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0138, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0112, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0109, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0242, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0210, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0137, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0220, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0133, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0159, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0125, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0098, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0086, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0153, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0181, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0157, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0167, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0123, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0135, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0103, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0157, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0108, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0083, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0137, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0198, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0145, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0149, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0234, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0172, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0142, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0129, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0185, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0162, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0152, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0113, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0163, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0144, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0224, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0275, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0200, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0157, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0129, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0151, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0191, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0147, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0123, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0180, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0100, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0183, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0163, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0164, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0110, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0105, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0130, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0178, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0161, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0115, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0144, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0206, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0136, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0102, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0122, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0204, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0120, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0100, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0092, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0124, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0102, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0148, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0152, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0088, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0160, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0118, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0138, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0192, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0160, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0108, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0211, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0134, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0142, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0146, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0152, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0149, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0116, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0182, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0164, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0198, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0129, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0100, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0111, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0171, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0115, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0119, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0089, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0129, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0134, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0140, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0131, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0080, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0181, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0085, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0140, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0128, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0120, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0190, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0236, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0120, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0154, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0123, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0150, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0184, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0152, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0151, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0117, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0184, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0146, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0134, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0115, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0158, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0120, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0157, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0143, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0155, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0131, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0117, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0117, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0181, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0243, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0198, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0157, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0143, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0122, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0216, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0157, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0111, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0165, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0111, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0193, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0192, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0091, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0169, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0140, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0141, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0151, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0148, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0136, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0131, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0166, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0091, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0245, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0161, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0091, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0093, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0171, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0111, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0103, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0165, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0265, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0091, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0231, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0258, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0161, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0143, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0167, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0143, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0087, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0146, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0132, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0159, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0217, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0108, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0121, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0168, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0108, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0165, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0188, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0109, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0177, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0197, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0211, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0221, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0122, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0151, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0135, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0226, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0080, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0137, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0179, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0098, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0086, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0236, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0118, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0178, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0244, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0178, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0102, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0145, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0211, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0217, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0142, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0188, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0141, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0081, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0212, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0110, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0113, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0210, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0134, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0245, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0186, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0161, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0198, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0108, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0167, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0107, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0223, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0164, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0182, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0158, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0118, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0139, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0098, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0140, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0126, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0166, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0141, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0099, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0221, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0172, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0177, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0120, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0175, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0150, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0093, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0097, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0114, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0119, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0151, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0130, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0138, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0146, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0189, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0153, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0103, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0140, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0125, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0179, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0154, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0152, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0135, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0098, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0094, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0105, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0103, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0113, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0103, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0086, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0200, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0281, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0277, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0279, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0141, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0148, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0080, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0165, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0137, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0124, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0108, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0181, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0200, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0127, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0149, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0206, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0100, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0193, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0167, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0119, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0120, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0149, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0106, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0113, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0205, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0231, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0313, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0095, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0169, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0107, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0172, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0149, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0178, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0131, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0224, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0087, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0124, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0205, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0108, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0113, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0127, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0107, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0163, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0111, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0128, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0173, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0101, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0150, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0125, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0197, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0245, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0159, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0151, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0208, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0199, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0177, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0159, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0145, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0168, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0084, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0089, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0116, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0110, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0178, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0122, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0195, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0208, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0105, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0176, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0207, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0159, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0116, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0131, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0161, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0092, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0141, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0083, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0106, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0122, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0139, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0164, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0122, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0199, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0114, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0110, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0111, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0148, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0190, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0087, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0135, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0147, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0123, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0182, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0087, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0118, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0150, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0206, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0169, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0096, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0141, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0129, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0109, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0106, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0162, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0229, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0135, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0147, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0234, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0108, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0133, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0113, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0179, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0131, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0171, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0137, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0106, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0082, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0216, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0103, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0153, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0178, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0162, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0087, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0126, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0223, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0114, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0119, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0188, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0094, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0181, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0162, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0181, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0160, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0090, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0227, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0237, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0110, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0160, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0172, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0152, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0128, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0120, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0192, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0181, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0140, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0133, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0164, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0217, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0193, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0164, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0144, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0084, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0138, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0161, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0108, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0191, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0101, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0142, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0143, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0165, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0246, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0255, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0187, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0115, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0174, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0220, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0167, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0097, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0161, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0165, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0161, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0156, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0124, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0087, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0125, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0089, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0152, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0110, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0124, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0210, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0146, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0183, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0113, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0166, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0187, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0080, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0088, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0191, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0125, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0154, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0139, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0140, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0311, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0084, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0113, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0129, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0169, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0108, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0084, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0167, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0217, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0188, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0093, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0193, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0142, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0091, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0163, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0145, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0175, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0104, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0187, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0163, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0145, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0108, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0155, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0157, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0111, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0114, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0106, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0326, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0199, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0137, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0087, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0176, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0081, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0157, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0206, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0194, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0124, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0126, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0084, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0098, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0177, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0163, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0094, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0162, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0236, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0143, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0126, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0192, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0141, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0184, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0162, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0192, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0103, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0103, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0125, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0103, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0130, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0132, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0141, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0171, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0206, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0102, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0193, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0094, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0146, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0085, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0169, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0305, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0109, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0089, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0111, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0157, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0107, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0193, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0122, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0153, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0182, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0231, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0176, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0124, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0104, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0122, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0155, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0161, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0096, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0158, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0097, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0117, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0086, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0142, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0178, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0156, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0125, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0168, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0184, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0170, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0125, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0112, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0160, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0163, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0188, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0196, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0125, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0094, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0196, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0107, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0104, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0217, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0111, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0227, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0150, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0111, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0096, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0105, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0136, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0081, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0207, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0088, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0142, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0185, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0109, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0203, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0114, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0178, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0133, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0131, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0162, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0081, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0111, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0209, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0198, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0256, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0177, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0103, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0168, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0145, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0112, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0120, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0230, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0194, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0150, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0109, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0096, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0235, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0082, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0096, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0125, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0101, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0125, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0124, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0121, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0174, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0156, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0120, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0134, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0123, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0101, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0166, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0119, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0161, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0113, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0242, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0096, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0259, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0123, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0157, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0133, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0100, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0166, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0121, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0110, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0110, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0084, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0171, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0156, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0112, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0113, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0129, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0114, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0134, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0145, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0115, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0228, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0096, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0142, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0115, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0192, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0092, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0125, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0090, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0161, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0180, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0180, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0094, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0145, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0108, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0139, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0124, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0132, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0186, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0088, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0171, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0146, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0083, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0183, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0161, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0101, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0140, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0171, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0142, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0113, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0159, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0089, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0195, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0114, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0193, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0092, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0104, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0118, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0126, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0164, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0117, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0104, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0284, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0120, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0238, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0237, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0176, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0224, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0183, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0153, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0139, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0126, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0109, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0113, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0135, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0148, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0101, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0115, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0138, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0118, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0094, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0163, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0263, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0082, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0171, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0089, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0146, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0148, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0084, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0111, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0169, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0103, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0140, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0151, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0233, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0129, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0157, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0186, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0100, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0094, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0120, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0137, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0154, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0105, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0114, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0085, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0089, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0144, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0128, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0143, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0153, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0152, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0253, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0216, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0218, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0241, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0198, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0231, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0164, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0220, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0097, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0202, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0100, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0131, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0139, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0085, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0131, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0120, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0194, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0180, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0153, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0137, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0172, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0186, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0166, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0131, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0158, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0271, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0086, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0110, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0145, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0098, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0097, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0162, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0140, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0132, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0226, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0222, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0201, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0131, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0274, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0110, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0173, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0124, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0144, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0165, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0170, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0111, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0178, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0080, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0169, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0226, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0276, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0175, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0142, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0179, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0098, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0173, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0082, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0234, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0096, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0161, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0193, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0144, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0174, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0200, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0131, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0116, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0118, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0121, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0142, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0104, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0128, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0204, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0184, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0117, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0201, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0197, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0125, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0097, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0107, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0133, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0215, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0169, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0118, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0140, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0126, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0185, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0114, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0112, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0081, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0135, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0102, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0200, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0164, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0161, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0128, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0156, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0151, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0111, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0133, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0153, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0242, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0167, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0147, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0145, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0138, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0178, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0149, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0175, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0129, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0113, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0121, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0092, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0173, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0123, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0147, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0091, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0158, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0127, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0208, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0158, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0253, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0204, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0126, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0153, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0240, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0114, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0195, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0089, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0133, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0181, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0091, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0131, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0123, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0145, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0127, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0234, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0161, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0113, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0144, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0085, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0170, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0173, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0200, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0202, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0125, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0192, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0128, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0105, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0081, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0138, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0087, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0119, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0091, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0178, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0121, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0144, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0166, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0252, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0085, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0205, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0107, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0158, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0156, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0205, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0229, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0184, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0108, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0216, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0172, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0209, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0141, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0175, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0163, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0105, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0153, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0136, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0152, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0149, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0121, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0221, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0244, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0125, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0189, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0168, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0091, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0120, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0165, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0135, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0109, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0210, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0163, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0238, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0176, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0174, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0193, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0186, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0176, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0182, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0098, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0115, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0142, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0129, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0102, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0096, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0146, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0096, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0224, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0130, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0188, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0298, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0124, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0149, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0200, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0155, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0134, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0203, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0254, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0194, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0093, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0143, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0098, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0128, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0087, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0118, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0168, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0193, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0191, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0200, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0171, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0154, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0111, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0178, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0166, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0103, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0135, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0132, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0119, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0106, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0107, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0140, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0135, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0166, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0234, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0106, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0116, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0208, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0255, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0171, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0230, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0102, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0149, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0109, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0133, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0091, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0266, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0176, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0141, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0132, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0108, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0161, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0256, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0135, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0195, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0171, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0180, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0183, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0130, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0139, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0153, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0190, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0085, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0100, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0098, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0111, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0096, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0162, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0202, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0113, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0196, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0141, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0140, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0215, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0111, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0180, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0233, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0169, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0154, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0141, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0096, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0169, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0102, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0194, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0227, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0197, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0206, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0149, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0238, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0166, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0319, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0181, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0216, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0167, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0168, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0094, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0123, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0084, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0100, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0117, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0190, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0195, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0091, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0223, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0145, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0142, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0136, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0136, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0097, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0118, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0091, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0153, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0111, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0176, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0091, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0117, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0183, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0198, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0093, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0160, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0140, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0165, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0175, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0117, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0087, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0084, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0149, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0127, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0159, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0122, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0149, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0118, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0149, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0143, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0171, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0236, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0156, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0115, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0214, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0205, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0172, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0132, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0108, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0147, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0117, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0146, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0132, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0204, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0142, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0118, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0097, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0275, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0136, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0166, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0154, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0204, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0183, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0191, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0165, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0222, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0171, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0083, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0174, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0203, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0139, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0105, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0140, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0120, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0102, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0120, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0164, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0097, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0129, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0131, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0089, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0117, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0235, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0142, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0118, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0182, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0124, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0146, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0083, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0135, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0118, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0182, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0194, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0118, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0150, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0150, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0208, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0173, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0161, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0242, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0138, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0150, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0178, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0136, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0165, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0137, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0097, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0147, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0141, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0115, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0094, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0213, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0189, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0120, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0139, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0149, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0135, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0128, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0159, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0288, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0111, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0127, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0127, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0154, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0118, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0194, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0119, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0184, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0160, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0098, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0138, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0109, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0272, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0179, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0133, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0084, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0080, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0119, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0192, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0094, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0101, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0132, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0175, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0118, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0277, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0091, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0137, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0134, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0171, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0102, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0097, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0162, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0135, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0087, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0136, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0149, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0117, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0188, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0242, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0153, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0208, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0155, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0096, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0156, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0161, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0105, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0115, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0175, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0094, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0143, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0173, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0174, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0185, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0117, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0156, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0132, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0136, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0132, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0126, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0139, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0234, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0188, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0197, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0086, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0092, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0081, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0124, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0159, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0135, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0090, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0197, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0112, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0089, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0124, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0083, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0117, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0114, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0227, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0214, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0141, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0238, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0142, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0145, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0180, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0087, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0157, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0134, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0167, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0155, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0258, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0191, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0105, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0163, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0124, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0170, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0119, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0111, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0188, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0232, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0118, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0086, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0148, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0126, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0088, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0118, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0112, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0147, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0096, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0146, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0199, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0149, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0232, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0158, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0097, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0102, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0167, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0156, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0133, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0110, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0204, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0163, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0186, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0224, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0097, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0101, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0171, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0106, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0149, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0191, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0102, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0244, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0094, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0333, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0169, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0092, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0083, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0121, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0118, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0152, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0250, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0238, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0157, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0137, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0132, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0230, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0144, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0128, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0160, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0094, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0118, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0139, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0225, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0148, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0146, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0254, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0156, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0123, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0238, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0124, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0246, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0088, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0196, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0098, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0208, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0132, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0188, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0091, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0169, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0151, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0087, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0182, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0211, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0198, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0180, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0150, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0247, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0183, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0133, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0088, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0137, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0120, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0084, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0117, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0137, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0083, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0115, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0195, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0138, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0124, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0099, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0179, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0115, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0146, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0085, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0139, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0145, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0095, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0146, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0127, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0194, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0116, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0122, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0146, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0191, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0200, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0116, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0124, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0114, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0093, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0114, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0117, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0171, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0103, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0131, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0183, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0116, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0105, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0101, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0193, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0168, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0096, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0122, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0168, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0085, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0116, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0149, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0116, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0110, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0151, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0117, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0137, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0113, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0181, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0157, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0204, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0187, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0198, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0179, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0176, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0187, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0192, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0247, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0225, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0101, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0166, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0149, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0209, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0160, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0183, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0129, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0094, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0171, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0183, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0126, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0147, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0117, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0158, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0103, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0209, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0134, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0214, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0209, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0095, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0115, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0204, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0133, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0097, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0147, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0197, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0095, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0155, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0114, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0148, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0114, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0150, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0174, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0097, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0137, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0093, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0158, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0090, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0225, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0110, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0087, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0228, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0164, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0098, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0175, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0164, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0135, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0130, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0316, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0107, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0104, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0127, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0217, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0139, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0090, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0132, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0215, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0158, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0104, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0111, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0150, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0112, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0115, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0107, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0137, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0136, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0170, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0127, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0191, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0210, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0216, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0170, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0217, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0131, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0093, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0106, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0148, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0119, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0125, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0146, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0123, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0094, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0217, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0128, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0160, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0160, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0131, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0179, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0152, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0162, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0204, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0173, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0150, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0171, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0117, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0206, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0184, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0163, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0319, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0223, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0094, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0155, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0192, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0180, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0087, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0128, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0085, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0117, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0184, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0089, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0226, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0139, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0161, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0124, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0185, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0163, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0113, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0139, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0133, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0144, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0134, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0094, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0121, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0167, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0100, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0200, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0154, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0184, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0186, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0144, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0135, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0155, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0123, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0170, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0130, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0100, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0123, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0238, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0251, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0119, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0179, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0162, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0192, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0118, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0109, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0149, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0182, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0091, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0112, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0146, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0118, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0154, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0157, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0143, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0172, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0307, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0120, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0138, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0133, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0211, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0164, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0186, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0171, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0109, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0125, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0199, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0082, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0090, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0137, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0092, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0178, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0173, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0159, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0161, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0150, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0209, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0192, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0186, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0096, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0094, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0122, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0133, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0119, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0145, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0124, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0086, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0158, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0154, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0176, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0167, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0165, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0191, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0218, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0158, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0121, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0128, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0133, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0108, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0096, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0159, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0181, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0102, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0128, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0172, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0117, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0215, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0133, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0161, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0088, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0117, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0180, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0167, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0131, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0086, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0133, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0143, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0283, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0127, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0186, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0180, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0166, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0123, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0201, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0194, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0146, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0137, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0144, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0193, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0243, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0117, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0129, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0165, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0119, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0111, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0175, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0176, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0111, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0186, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0245, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0106, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0085, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0133, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0080, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0128, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0117, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0091, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0259, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0124, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0107, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0188, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0119, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0143, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0168, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0236, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0129, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0213, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0169, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0146, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0165, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0160, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0247, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0247, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0106, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0119, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0172, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0097, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0204, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0119, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0174, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0172, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0107, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0138, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0136, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0150, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0237, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0086, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0171, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0116, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0115, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0097, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0121, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0132, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0144, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0107, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0100, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0102, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0195, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0100, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0129, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0257, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0151, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0177, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0198, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0163, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0147, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0147, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0110, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0180, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0107, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0124, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0152, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0109, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0197, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0170, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0273, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0288, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0180, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0161, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0112, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0122, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0088, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0087, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0086, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0103, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0138, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0086, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0139, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0109, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0215, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0129, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0138, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0104, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0166, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0152, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0150, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0123, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0134, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0161, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0191, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0183, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0167, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0187, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0111, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0184, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0121, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0238, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0175, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0176, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0151, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0218, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0236, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0112, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0178, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0153, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0141, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0113, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0219, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0235, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0179, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0173, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0229, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0213, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0168, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0153, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0102, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0129, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0186, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0117, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0265, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0163, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0143, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0162, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0161, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0127, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0243, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0122, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0178, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0135, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0112, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0143, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0113, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0084, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0098, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0083, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0098, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0149, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0181, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0203, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0149, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0148, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0213, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0099, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0151, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0116, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0232, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0216, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0207, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0147, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0182, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0129, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0145, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0135, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0199, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0101, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0175, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0097, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0158, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0107, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0149, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0164, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0112, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0208, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0125, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0213, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0128, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0152, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0125, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0092, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0114, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0133, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0192, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0148, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0143, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0113, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0094, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0145, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0252, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0227, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0128, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0108, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0259, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0119, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0180, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0180, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0124, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0129, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0140, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0112, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0117, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0178, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0151, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0147, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0218, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0171, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0267, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0202, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0198, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0174, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0132, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0120, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0231, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0201, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0150, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0177, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0125, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0086, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0190, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0117, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0213, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0166, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0134, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0121, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0144, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0171, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0138, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0141, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0102, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0146, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0104, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0095, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0172, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0218, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0121, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0153, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0170, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0230, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0107, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0112, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0134, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0194, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0091, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0193, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0126, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0103, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0183, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0146, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0086, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0123, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0146, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0223, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0121, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0162, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0120, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0118, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0155, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0230, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0146, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0227, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0154, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0170, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0102, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0163, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0095, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0151, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0097, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0187, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0189, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0155, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0153, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0097, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0105, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0119, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0182, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0131, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0217, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0131, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0116, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0138, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0108, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0128, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0150, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0166, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0212, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0088, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0096, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0170, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0225, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0205, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0090, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0236, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0233, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0154, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0138, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0232, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0127, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0092, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0111, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0084, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0134, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0128, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0201, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0111, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0256, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0134, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0158, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0106, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0196, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0220, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0085, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0129, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0129, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0134, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0140, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0181, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0167, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0112, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0168, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0102, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0104, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0129, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0258, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0144, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0166, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0202, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0188, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0104, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0206, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0107, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0150, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0137, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0134, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0226, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0201, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0168, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0171, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0194, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0137, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0091, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0124, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0094, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0127, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0150, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0108, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0113, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0111, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0161, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0192, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0154, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0154, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0206, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0354, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0157, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0198, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0114, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0159, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0085, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0105, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0244, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0099, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0141, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0171, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0295, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0128, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0175, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0088, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0124, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0181, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0212, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0178, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0158, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0145, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0178, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0261, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0147, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0199, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0117, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0127, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0116, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0184, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0220, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0229, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0205, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0198, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0133, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0129, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0174, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0189, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0172, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0130, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0134, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0117, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0176, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0308, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0175, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0234, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0106, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0167, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0193, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0219, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0161, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0115, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0109, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0127, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0095, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0103, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0156, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0145, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0085, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0105, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0142, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0257, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0173, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0121, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0159, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0114, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0142, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0148, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0237, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0183, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0144, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0161, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0171, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0126, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0172, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0160, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0244, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0091, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0155, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0083, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0166, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0157, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0175, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0171, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0196, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0131, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0128, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0149, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0154, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0129, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0172, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0163, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0096, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0130, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0187, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0127, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0209, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0132, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0146, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0101, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0211, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0126, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0148, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0181, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0149, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0182, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0149, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0297, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0181, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0114, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0157, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0267, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0150, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0205, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0099, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0159, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0247, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0147, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0190, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0132, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0164, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0229, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0193, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0095, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0178, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0130, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0158, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0157, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0094, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0128, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0108, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0119, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0125, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0097, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0118, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0201, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0133, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0149, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0140, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0122, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0101, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0120, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0108, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0132, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0101, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0118, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0200, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0159, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0106, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0094, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0186, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0179, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0187, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0228, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0172, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0187, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0148, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0189, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0127, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0115, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0084, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0162, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0089, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0146, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0146, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0213, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0142, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0141, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0198, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0261, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0113, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0177, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0218, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0208, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0141, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0278, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0146, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0114, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0104, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0089, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0159, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0124, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0133, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0185, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0137, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0186, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0085, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0190, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0143, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0124, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0253, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0193, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0102, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0240, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0116, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0166, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0199, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0093, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0200, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0096, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0114, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0184, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0116, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0149, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0139, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0159, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0099, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0114, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0154, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0207, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0121, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0197, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0152, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0156, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0182, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0117, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0175, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0208, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0091, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0172, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0100, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0174, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0163, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0190, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0150, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0226, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0101, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0143, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0132, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0105, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0155, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0199, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0168, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0199, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0160, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0152, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0133, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0119, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0161, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0104, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0088, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0145, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0177, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0104, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0188, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0115, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0215, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0080, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0238, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0200, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0107, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0177, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0158, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0116, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0153, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0159, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0131, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0127, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0162, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0114, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0196, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0139, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0184, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0242, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0185, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0217, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0184, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0228, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0248, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0082, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0135, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0190, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0139, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0185, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0120, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0110, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0122, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0140, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0128, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0162, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0161, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0166, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0170, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0193, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0126, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0268, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0229, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0120, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0129, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0110, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0177, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0132, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0202, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0162, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0168, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0120, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0099, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0132, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0178, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0105, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0140, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0141, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0096, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0183, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0127, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0146, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0120, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0116, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0204, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0190, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0097, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0117, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0136, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0129, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0203, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0106, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0160, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0126, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0183, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0149, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0160, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0177, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0123, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0192, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0108, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0193, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0164, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0178, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0220, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0201, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0109, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0131, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0135, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0133, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0147, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0095, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0132, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0088, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0106, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0238, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0103, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0153, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0197, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0253, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0088, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0180, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0166, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0088, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0080, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0107, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0107, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0177, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0169, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0101, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0122, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0138, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0211, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0111, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0148, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0212, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0153, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0083, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0162, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0145, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0155, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0191, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0173, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0236, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0220, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0120, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0184, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0165, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0102, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0094, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0165, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0113, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0230, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0211, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0091, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0118, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0200, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0218, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0162, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0142, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0226, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0089, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0156, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0152, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0121, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0141, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0185, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0111, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0179, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0193, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0192, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0246, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0103, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0183, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0190, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0100, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0096, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0091, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0135, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0140, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0332, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0148, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0096, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0182, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0210, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0157, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0117, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0109, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0244, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0153, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0206, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0123, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0238, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0135, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0140, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0216, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0099, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0182, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0088, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0178, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0149, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0152, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0177, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0172, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0149, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0172, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0268, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0130, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0149, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0136, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0129, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0144, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0169, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0223, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0118, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0116, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0137, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0102, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0182, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0113, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0180, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0126, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0163, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0181, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0136, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0256, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0120, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0190, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0109, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0162, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0122, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0118, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0132, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0171, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0156, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0113, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0183, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0188, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0149, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0143, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0181, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0099, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0158, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0115, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0114, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0131, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0096, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0148, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0165, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0112, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0166, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0213, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0087, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0126, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0155, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0126, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0249, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0165, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0119, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0080, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0160, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0147, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0232, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0083, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0239, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0109, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0218, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0199, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0170, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0129, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0138, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0131, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0273, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0176, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0140, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0128, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0137, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0147, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0107, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0207, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0178, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0138, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0147, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0179, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0121, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0215, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0175, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0135, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0217, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0192, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0120, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0169, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0187, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0163, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0206, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0111, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0094, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0161, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0095, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0150, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0108, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0138, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0165, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0154, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0170, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0165, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0284, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0174, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0137, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0157, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0195, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0111, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0120, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0214, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0116, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0179, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0157, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0102, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0103, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0150, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0172, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0196, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0224, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0164, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0204, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0243, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0325, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0182, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0115, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0166, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0112, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0107, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0121, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0085, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0113, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0097, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0206, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0150, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0117, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0134, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0229, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0134, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0153, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0219, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0081, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0266, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0105, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0158, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0182, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0132, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0162, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0116, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0234, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0158, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0109, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0149, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0145, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0147, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0090, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0230, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0105, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0093, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0135, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0127, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0170, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0113, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0155, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0132, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0115, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0183, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0181, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0100, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0087, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0105, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0190, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0165, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0153, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0111, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0134, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0174, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0142, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0172, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0155, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0172, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0172, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0164, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0096, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0249, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0158, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0098, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0154, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0104, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0197, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0139, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0139, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0142, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0144, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0108, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0113, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0117, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0085, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0084, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0275, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0088, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0134, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0165, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0193, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0111, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0133, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0106, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0237, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0209, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0118, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0081, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0179, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0147, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0161, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0085, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0109, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0100, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0110, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0099, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0166, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0120, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0155, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0095, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0146, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0199, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0223, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0104, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0267, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0116, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0098, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0118, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0082, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0093, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0119, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0194, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0101, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0121, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0168, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0114, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0214, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0143, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0126, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0164, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0207, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0106, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0112, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0108, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0141, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0103, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0106, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0122, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0204, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0160, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0131, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0211, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0124, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0212, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0140, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0128, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0113, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0174, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0140, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0154, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0163, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0157, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0104, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0142, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0171, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0139, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0149, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0137, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0113, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0206, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0098, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0183, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0113, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0139, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0098, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0131, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0085, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0090, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0175, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0114, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0086, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0169, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0164, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0127, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0116, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0222, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0177, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0216, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0093, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0149, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0189, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0271, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0152, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0182, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0086, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0116, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0238, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0173, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0084, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0198, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0135, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0221, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0141, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0241, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0126, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0204, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0090, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0178, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0141, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0150, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0192, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0205, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0180, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0100, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0121, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0156, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0243, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0102, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0181, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0119, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0241, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0220, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0118, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0184, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0178, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0141, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0176, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0084, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0124, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0158, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0147, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0110, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0194, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0144, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0165, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0215, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0133, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0118, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0195, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0254, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0184, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0181, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0164, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0225, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0218, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0221, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0259, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0119, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0087, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0220, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0163, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0085, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0132, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0117, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0094, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0131, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0244, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0143, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0257, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0172, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0216, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0107, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0134, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0132, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0135, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0192, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0175, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0085, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0090, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0139, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0115, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0206, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0092, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0147, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0124, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0199, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0259, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0170, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0128, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0135, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0183, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0111, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0172, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0140, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0124, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0125, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0111, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0238, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0131, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0146, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0183, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0136, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0214, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0224, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0171, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0123, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0155, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0195, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0166, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0224, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0168, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0110, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0094, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0127, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0156, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0140, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0201, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0166, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0141, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0131, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0119, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0219, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0145, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0147, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0162, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0180, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0123, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0093, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0129, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0179, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0099, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0182, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0162, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0139, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0108, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0189, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0130, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0163, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0186, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0100, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0196, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0173, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0170, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0166, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0130, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0224, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0084, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0163, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0123, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0131, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0110, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0094, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0145, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0130, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0115, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0206, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0121, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0219, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0166, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0097, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0136, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0234, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0115, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0117, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0095, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0148, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0158, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0108, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0161, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0083, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0085, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0135, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0119, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0121, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0131, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0200, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0178, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0148, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0122, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0219, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0099, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0122, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0125, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0141, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0179, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0130, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0115, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0117, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0109, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0111, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0165, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0116, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0164, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0118, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0289, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0219, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0119, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0173, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0135, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0106, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0140, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0209, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0160, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0152, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0161, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0156, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0141, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0120, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0144, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0153, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0125, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0148, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0165, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0139, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0131, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0181, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0220, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0111, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0190, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0252, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0213, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0156, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0117, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0164, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0205, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0097, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0171, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0180, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0172, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0167, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0169, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0189, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0132, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0195, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0207, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0216, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0103, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0198, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0166, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0155, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0191, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0159, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0186, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0219, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0135, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0190, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0093, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0116, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0110, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0125, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0161, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0162, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0176, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0097, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0180, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0088, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0194, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0209, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0111, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0109, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0147, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0270, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0122, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0150, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0099, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0110, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0095, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0135, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0183, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0136, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0098, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0112, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0142, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0221, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0111, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0159, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0184, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0230, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0180, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0161, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0173, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0160, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0128, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0104, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0162, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0183, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0122, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0216, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0132, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0104, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0129, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0102, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0117, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0187, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0245, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0106, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0200, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0164, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0169, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0136, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0157, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0131, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0102, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0102, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0110, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0150, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0129, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0112, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0173, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0162, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0156, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0097, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0132, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0223, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0128, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0128, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0129, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0122, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0177, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0118, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0166, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0140, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0177, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0111, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0140, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0176, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0096, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0129, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0104, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0170, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0210, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0139, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0104, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0118, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0130, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0128, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0097, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0145, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0182, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0116, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0210, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0110, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0140, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0111, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0101, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0173, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0105, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0186, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0095, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0094, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0145, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0118, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0217, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0214, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0107, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0102, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0109, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0091, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0123, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0101, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0123, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0171, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0123, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0140, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0148, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0103, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0136, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0118, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0080, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0150, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0099, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0100, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0173, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0087, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0108, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0118, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0113, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0146, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0156, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0128, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0118, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0117, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0106, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0141, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0232, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0197, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0112, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0168, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0178, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0124, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0128, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0177, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0170, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0231, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0117, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0080, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0108, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0283, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0120, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0105, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0121, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0170, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0123, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0180, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0214, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0158, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0094, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0133, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0236, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0119, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0260, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0137, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0197, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0131, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0143, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0133, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0223, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0108, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0158, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0162, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0087, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0123, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0090, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0122, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0128, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0084, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0138, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0085, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0129, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0189, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0088, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0081, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0150, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0114, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0144, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0164, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0120, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0208, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0132, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0194, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0106, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0244, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0146, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0175, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0135, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0089, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0115, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0226, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0269, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0166, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0101, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0118, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0089, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0147, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0105, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0102, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0131, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0098, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0166, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0120, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0114, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0102, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0153, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0117, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0166, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0108, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0081, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0084, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0108, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0080, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0097, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0237, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0122, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0232, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0132, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0134, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0153, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0185, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0155, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0148, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0124, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0095, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0162, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0105, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0172, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0108, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0158, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0082, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0190, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0116, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0210, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0152, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0138, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0184, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0084, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0097, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0157, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0174, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0092, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0196, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0109, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0151, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0150, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0153, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0114, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0140, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0211, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0138, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0218, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0131, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0148, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0180, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0132, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0160, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0139, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0090, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0130, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0099, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0170, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0153, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0184, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0158, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0143, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0225, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0085, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0160, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0201, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0132, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0221, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0130, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0103, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0206, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0098, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0265, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0129, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0142, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0089, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0141, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0182, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0133, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0153, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0154, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0135, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0118, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0147, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0146, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0117, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0165, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0123, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0148, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0122, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0136, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0098, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0092, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0133, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0196, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0184, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0102, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0131, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0194, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0220, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0128, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0146, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0184, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0124, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0158, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0167, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0123, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0173, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0123, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0180, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0164, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0136, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0143, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0162, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0224, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0088, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0144, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0269, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0098, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0172, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0121, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0146, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0098, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0120, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0119, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0125, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0172, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0130, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0149, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0217, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0088, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0178, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0103, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0118, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0204, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0182, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0129, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0132, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0118, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0117, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0153, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0178, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0110, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0146, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0090, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0107, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0143, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0122, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0150, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0213, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0118, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0141, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0139, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0110, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0151, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0119, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0099, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0167, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0236, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0188, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0112, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0152, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0087, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0108, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0161, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0126, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0121, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0181, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0119, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0192, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0155, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0082, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0192, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0247, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0115, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0096, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0152, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0125, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0140, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0108, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0099, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0108, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0138, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0108, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0143, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0227, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0154, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0088, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0142, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0161, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0098, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0275, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0099, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0130, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0196, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0139, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0163, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0168, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0175, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0178, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0109, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0104, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0156, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0199, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0112, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0118, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0096, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0109, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0223, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0120, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0110, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0088, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0192, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0202, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0126, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0093, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0109, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0145, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0113, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0181, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0167, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0126, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0089, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0083, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0136, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0190, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0107, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0159, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0174, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0159, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0101, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0178, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0220, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0156, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0181, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0091, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0171, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0171, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0109, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0110, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0085, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0106, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0146, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0219, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0133, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0134, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0106, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0203, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0095, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0110, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0094, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0185, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0103, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0175, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0088, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0103, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0156, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0195, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0112, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0091, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0089, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0103, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0214, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0124, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0098, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0163, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0200, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0142, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0132, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0093, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0131, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0149, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0130, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0237, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0113, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0135, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0147, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0207, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0151, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0148, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0112, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0177, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0164, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0151, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0130, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0116, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0111, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0155, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0164, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0231, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0112, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0183, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0226, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0237, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0136, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0169, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0098, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0194, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0097, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0176, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0153, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0096, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0093, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0145, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0128, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0144, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0166, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0184, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0129, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0158, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0089, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0199, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0128, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0091, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0108, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0111, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0088, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0161, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0112, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0174, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0108, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0108, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0154, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0198, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0088, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0138, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0104, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0214, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0169, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0091, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0138, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0103, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0140, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0109, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0154, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0104, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0187, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0201, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0111, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0139, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0147, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0211, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0083, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0150, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0209, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0113, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0152, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0106, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0154, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0107, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0156, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0167, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0145, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0178, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0216, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0092, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0089, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0082, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0120, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0111, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0118, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0123, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0365, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0120, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0188, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0139, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0158, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0087, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0111, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0139, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0123, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0132, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0134, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0199, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0235, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0147, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0109, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0139, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0137, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0095, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0113, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0120, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0141, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0137, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0153, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0152, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0163, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0133, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0137, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0125, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0139, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0132, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0196, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0159, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0192, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0174, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0201, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0118, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0204, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0202, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0115, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0210, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0166, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0162, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0215, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0129, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0112, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0161, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0164, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0081, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0155, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0093, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0100, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0091, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0086, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0156, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0205, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0104, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0113, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0102, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0103, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0199, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0125, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0204, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0117, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0162, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0139, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0095, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0111, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0158, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0145, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0101, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0156, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0097, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0163, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0089, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0144, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0087, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0109, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0103, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0111, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0113, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0109, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0205, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0103, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0136, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0126, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0093, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0080, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0156, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0220, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0086, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0163, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0142, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0204, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0114, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0145, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0083, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0126, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0195, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0195, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0153, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0157, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0158, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0112, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0151, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0193, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0105, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0172, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0188, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0099, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0136, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0088, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0128, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0171, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0141, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0122, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0083, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0192, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0114, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0105, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0166, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0154, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0161, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0115, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0138, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0084, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0126, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0109, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0092, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0095, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0116, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0081, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0227, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0234, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0208, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0159, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0301, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0138, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0122, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0141, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0082, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0142, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0089, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0103, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0160, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0099, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0145, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0143, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0124, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0107, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0175, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0151, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0102, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0081, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0203, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0205, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0138, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0100, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0124, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0088, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0113, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0110, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0167, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0084, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0130, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0120, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0156, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0165, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0161, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0163, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0133, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0196, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0154, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0119, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0117, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0168, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0106, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0136, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0123, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0188, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0155, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0144, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0196, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0143, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0108, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0094, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0149, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0085, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0192, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0099, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0156, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0158, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0104, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0155, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0148, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0129, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0199, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0139, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0142, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0155, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0085, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0091, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0197, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0156, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0096, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0091, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0105, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0128, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0156, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0087, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0146, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0147, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0083, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0113, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0086, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0123, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0095, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0162, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0132, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0088, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0216, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0162, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0109, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0206, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0130, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0118, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0140, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0087, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0210, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0119, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0086, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0147, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0131, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0082, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0094, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0095, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0087, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0132, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0150, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0120, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0237, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0198, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0137, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0215, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0269, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0113, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0097, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0114, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0146, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0098, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0136, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0112, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0148, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0105, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0089, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0136, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0142, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0124, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0095, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0172, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0106, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0185, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0163, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0110, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0136, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0141, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0150, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0137, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0135, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0197, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0272, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0182, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0089, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0108, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0098, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0098, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0080, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0231, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0112, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0117, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0118, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0178, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0116, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0084, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0166, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0148, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0159, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0120, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0106, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0194, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0147, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0199, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0151, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0120, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0096, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0142, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0118, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0212, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0121, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0164, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0138, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0143, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0170, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0119, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0088, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0146, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0177, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0115, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0091, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0095, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0105, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0130, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0101, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0196, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0119, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0087, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0133, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0178, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0104, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0213, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0168, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0152, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0150, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0127, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0158, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0178, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0123, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0146, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0186, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0100, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0111, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0096, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0119, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0130, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0095, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0241, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0114, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0108, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0185, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0194, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0174, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0102, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0138, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0201, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0158, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0091, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0102, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0141, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0131, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0155, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0120, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0151, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0120, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0144, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0099, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0245, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0166, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0139, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0185, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0151, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0086, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0141, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0168, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0119, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0080, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0158, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0203, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0192, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0120, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0180, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0104, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0143, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0181, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0248, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0150, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0104, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0192, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0126, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0107, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0165, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0098, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0094, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0110, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0100, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0092, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0126, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0108, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0089, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0091, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0207, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0092, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0119, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0110, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0114, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0198, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0093, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0206, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0113, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0162, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0110, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0165, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0176, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0107, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0105, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0101, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0122, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0090, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0165, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0189, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0138, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0148, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0097, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0115, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0110, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0097, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0227, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0113, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0130, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0229, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0201, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0156, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0091, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0205, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0103, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0158, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0126, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0161, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0107, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0095, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0104, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0119, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0138, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0221, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0121, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0214, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0157, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0142, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0111, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0117, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0106, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0161, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0083, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0110, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0116, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0098, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0139, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0153, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0173, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0086, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0103, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0154, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0109, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0184, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0081, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0177, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0238, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0095, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0163, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0083, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0089, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0149, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0111, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0105, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0156, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0084, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0122, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0138, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0133, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0115, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0093, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0153, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0102, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0125, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0087, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0094, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0156, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0196, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0083, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0199, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0164, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0149, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0111, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0161, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0089, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0233, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0175, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0088, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0097, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0133, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0205, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0135, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0158, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0202, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0126, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0161, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0185, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0181, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0091, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0155, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0171, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0121, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0080, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0164, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0096, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0192, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0123, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0214, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0132, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0197, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0140, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0091, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0119, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0095, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0131, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0104, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0176, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0133, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0110, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0133, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0136, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0147, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0157, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0184, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0131, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0140, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0100, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0098, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0117, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0130, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0214, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0138, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0121, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0195, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0108, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0137, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0216, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0170, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0188, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0191, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0097, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0219, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0097, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0121, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0155, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0103, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0123, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0133, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0196, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0094, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0162, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0126, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0189, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0142, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0104, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0148, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0133, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0121, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0139, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0244, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0120, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0149, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0095, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0258, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0119, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0149, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0171, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0136, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0171, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0111, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0172, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0157, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0098, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0167, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0104, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0098, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0080, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0232, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0094, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0143, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0211, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0125, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0105, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0099, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0118, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0159, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0141, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0120, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0169, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0116, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0214, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0116, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0206, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0147, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0170, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0199, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0099, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0157, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0139, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0121, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0115, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0235, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0137, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0227, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0101, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0155, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0134, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0173, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0179, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0166, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0116, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0202, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0232, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0118, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0117, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0090, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0146, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0140, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0133, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0169, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0176, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0128, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0084, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0167, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0129, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0113, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0197, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0224, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0133, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0163, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0098, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0094, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0195, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0135, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0084, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0116, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0152, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0144, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0157, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0087, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0241, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0211, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0084, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0136, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0199, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0106, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0117, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0112, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0101, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0162, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0106, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0144, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0086, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0096, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0108, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0125, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0097, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0172, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0162, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0117, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0139, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0086, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0087, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0139, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0113, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0086, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0195, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0238, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0144, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0107, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0087, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0119, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0205, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0190, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0151, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0122, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0105, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0118, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0127, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0187, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0088, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0190, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0225, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0179, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0100, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0131, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0174, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0128, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0111, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0119, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0165, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0204, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0245, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0086, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0132, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0137, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0215, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0139, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0148, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0152, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0148, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0118, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0116, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0104, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0103, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0139, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0124, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0107, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0170, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0111, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0083, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0164, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0120, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0125, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0194, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0175, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0213, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0081, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0119, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0113, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0086, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0117, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0137, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0179, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0165, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0123, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0111, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0141, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0100, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0147, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0129, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0103, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0098, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0098, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0178, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0143, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0121, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0127, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0273, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0164, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0080, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0165, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0126, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0128, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0083, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0148, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0140, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0132, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0152, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0174, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0086, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0101, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0119, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0105, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0108, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0137, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0126, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0248, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0142, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0139, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0121, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0120, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0117, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0131, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0117, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0128, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0080, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0177, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0265, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0167, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0217, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0116, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0151, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0147, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0224, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0100, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0110, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0152, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0125, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0087, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0086, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0114, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0115, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0119, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0087, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0113, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0183, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0144, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0164, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0204, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0095, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0088, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0133, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0085, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0105, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0164, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0103, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0103, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0110, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0160, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0152, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0172, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0155, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0195, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0122, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0095, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0142, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0192, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0024, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0024, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0024, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0023, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0023, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0023, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0021, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0021, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0021, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0019, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0019, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0019, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0017, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0017, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0080, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0131, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0191, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0177, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0211, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0095, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0178, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0096, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0135, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0121, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0162, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0140, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0108, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0200, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0160, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0139, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0144, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0179, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0194, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0149, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0113, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0103, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0133, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0128, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0085, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0082, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0223, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0193, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0134, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0128, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0149, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0190, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0097, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0122, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0122, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0167, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0164, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0131, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0090, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0108, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0124, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0164, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0123, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0177, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0104, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0131, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0110, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0158, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0114, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0097, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0087, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0081, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0102, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0123, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0089, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0138, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0124, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0139, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0228, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0140, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0117, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0168, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0081, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0106, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0132, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0093, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0131, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0173, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0172, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0113, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0128, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0147, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0118, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0117, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0203, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0165, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0111, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0126, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0099, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0123, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0091, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0136, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0139, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0139, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0105, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0093, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0126, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0091, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0125, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0091, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0117, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0111, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0130, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0176, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0086, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0130, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0205, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0177, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0180, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0109, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0180, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0148, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0087, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0179, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0094, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0146, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0087, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0122, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0155, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0093, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0140, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0097, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0162, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0164, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0138, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0140, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0086, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0106, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0276, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0097, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0162, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0082, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0122, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0100, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0150, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0145, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0168, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0109, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0123, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0099, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0119, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0108, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0190, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0142, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0181, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0170, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0180, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0171, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0111, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0084, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0185, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0123, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0132, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0103, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0113, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0121, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0098, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0149, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0099, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0117, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0134, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0114, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0140, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0161, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0124, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0107, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0088, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0105, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0183, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0125, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0163, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0107, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0148, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0094, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0090, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0115, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0178, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0124, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0109, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0244, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0112, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0154, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0171, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0168, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0120, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0158, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0276, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0150, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0118, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0097, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0089, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0136, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0097, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0081, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0091, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0090, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0121, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0141, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0129, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0170, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0120, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0152, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0135, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0122, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0188, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0092, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0099, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0106, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0187, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0105, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0092, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0184, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0136, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0082, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0115, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0107, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0139, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0117, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0170, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0107, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0199, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0126, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0119, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0149, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0106, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0174, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0132, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0102, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0159, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0152, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0170, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0113, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0253, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0181, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0203, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0109, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0099, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0229, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0203, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0117, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0121, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0084, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0161, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0097, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0114, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0121, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0126, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0081, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0147, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0153, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0106, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0158, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0189, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0136, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0132, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0207, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0195, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0087, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0120, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0275, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0093, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0183, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0133, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0091, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0193, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0103, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0103, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0156, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0085, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0217, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0098, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0129, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0140, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0187, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0142, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0085, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0114, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0113, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0267, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0220, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0159, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0134, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0161, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0137, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0133, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0099, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0133, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0129, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0106, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0129, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0105, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0135, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0182, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0101, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0183, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0162, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0174, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0140, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0138, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0114, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0123, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0122, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0176, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0184, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0102, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0108, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0142, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0086, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0101, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0142, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0128, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0106, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0118, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0111, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0125, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0132, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0115, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0091, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0099, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0104, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0129, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0117, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0117, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0084, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0095, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0135, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0231, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0136, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0162, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0120, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0142, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0100, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0138, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0207, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0159, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0172, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0098, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0128, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0119, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0196, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0131, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0109, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0199, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0131, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0168, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0109, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0098, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0111, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0091, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0109, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0096, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0094, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0095, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0117, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0187, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0109, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0158, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0165, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0160, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0155, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0127, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0236, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0159, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0101, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0217, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0182, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0134, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0111, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0134, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0143, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0182, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0085, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0146, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0142, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0138, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0080, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0090, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0143, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0094, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0127, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0110, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0181, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0191, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0109, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0091, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0145, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0132, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0107, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0107, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0125, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0145, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0183, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0148, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0207, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0167, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0117, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0114, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0104, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0141, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0150, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0123, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0090, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0198, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0131, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0111, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0084, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0135, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0237, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0130, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0279, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0110, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0083, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0144, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0139, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0134, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0129, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0102, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0153, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0107, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0097, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0166, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0102, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0245, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0196, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0153, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0154, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0132, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0237, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0104, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0184, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0097, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0093, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0133, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0145, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0092, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0082, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0119, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0187, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0115, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0152, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0185, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0092, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0107, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0094, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0241, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0094, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0139, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0092, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0148, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0098, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0158, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0129, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0212, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0118, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0216, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0096, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0144, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0215, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0138, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0157, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0193, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0154, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0133, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0096, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0123, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0167, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0154, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0093, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0133, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0161, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0111, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0135, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0190, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0132, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0092, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0121, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0184, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0089, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0175, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0141, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0177, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0130, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0182, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0181, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0168, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0095, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0125, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0165, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0143, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0093, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0102, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0113, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0090, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0121, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0119, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0162, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0218, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0150, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0116, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0089, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0125, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0085, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0182, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0146, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0191, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0140, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0221, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0107, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0134, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0094, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0090, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0093, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0173, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0129, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0148, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0105, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0131, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0084, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0117, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0090, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0205, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0169, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0100, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0132, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0097, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0117, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0217, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0145, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0102, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0107, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0186, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0112, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0124, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0086, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0156, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0092, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0167, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0127, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0143, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0088, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0110, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0080, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0151, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0096, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0140, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0097, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0173, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0138, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0096, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0109, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0134, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0099, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0152, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0184, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0084, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0100, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0211, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0083, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0097, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0101, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0111, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0183, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0177, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0158, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0123, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0137, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0127, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0196, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0113, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0082, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0115, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0089, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0200, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0131, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0189, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0085, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0116, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0095, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0129, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0174, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0143, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0120, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0116, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0090, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0204, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0126, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0104, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0092, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0135, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0161, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0169, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0133, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0118, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0114, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0084, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0137, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0160, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0205, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0139, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0159, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0096, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0109, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0129, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0279, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0090, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0219, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0120, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0122, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0237, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0085, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0168, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0148, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0135, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0111, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0205, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0089, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0109, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0098, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0097, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0163, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0130, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0203, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0233, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0163, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0133, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0116, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0269, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0141, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0093, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0165, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0081, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0107, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0086, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0129, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0162, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0089, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0099, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0101, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0084, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0108, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0163, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0134, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0120, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0148, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0156, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0211, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0198, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0148, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0110, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0126, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0137, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0165, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0088, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0130, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0176, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0187, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0155, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0126, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0149, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0137, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0187, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0119, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0092, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0115, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0144, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0117, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0133, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0116, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0125, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0119, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0166, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0131, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0102, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0118, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0106, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0136, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0218, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0229, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0161, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0124, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0095, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0122, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0147, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0160, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0140, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0093, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0088, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0115, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0208, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0193, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0128, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0106, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0095, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0151, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0113, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0162, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0127, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0122, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0096, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0092, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0182, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0125, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0167, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0135, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0093, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0112, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0185, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0158, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0024, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0151, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0099, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0121, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0086, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0138, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0091, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0176, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0142, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0153, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0139, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0144, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0133, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0175, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0210, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0136, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0121, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0080, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0194, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0087, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0101, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0128, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0088, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0203, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0146, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0088, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0113, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0106, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0095, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0150, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0171, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0092, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0179, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0280, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0190, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0106, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0095, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0137, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0119, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0128, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0142, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0082, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0141, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0111, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0147, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0161, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0135, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0120, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0139, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0213, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0167, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0168, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0152, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0094, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0149, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0118, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0122, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0080, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0258, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0120, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0120, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0168, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0089, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0126, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0091, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0127, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0119, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0151, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0214, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0229, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0155, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0260, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0104, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0111, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0094, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0093, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0097, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0132, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0149, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0108, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0106, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0172, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0165, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0151, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0131, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0122, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0112, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0207, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0145, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0202, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0102, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0139, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0101, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0152, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0129, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0130, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0119, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0100, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0117, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0101, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0105, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0132, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0182, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0120, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0165, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0126, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0163, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0161, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0141, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0142, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0150, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0159, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0145, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0143, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0199, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0177, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0094, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0091, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0098, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0097, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0169, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0179, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0100, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0148, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0140, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0160, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0084, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0176, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0120, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0121, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0126, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0110, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0081, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0092, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0105, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0086, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0093, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0139, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0123, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0090, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0083, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0189, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0153, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0104, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0155, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0095, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0120, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0087, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0091, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0123, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0113, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0180, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0173, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0100, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0118, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0134, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0168, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0099, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0143, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0140, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0100, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0124, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0090, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0214, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0107, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0081, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0176, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0096, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0098, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0118, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0188, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0095, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0137, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0186, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0109, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0158, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0101, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0175, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0133, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0172, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0094, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0144, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0143, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0166, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0148, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0134, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0175, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0119, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0097, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0122, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0162, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0147, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0129, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0092, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0121, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0106, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0148, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0251, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0088, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0092, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0215, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0146, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0081, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0115, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0137, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0137, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0240, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0123, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0116, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0195, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0144, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0131, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0093, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0104, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0138, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0109, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0112, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0189, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0080, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0178, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0122, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0094, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0147, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0184, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0133, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0224, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0146, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0115, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0096, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0082, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0115, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0124, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0127, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0254, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0162, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0178, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0150, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0132, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0140, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0091, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0151, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0121, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0161, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0098, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0161, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0191, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0134, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0117, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0088, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0082, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0137, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0181, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0091, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0091, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0127, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0189, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0149, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0090, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0123, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0132, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0191, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0138, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0127, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0109, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0193, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0096, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0129, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0129, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0131, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0091, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0091, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0089, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0154, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0125, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0145, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0107, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0098, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0095, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0197, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0138, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0188, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0170, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0180, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0138, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0097, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0127, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0111, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0087, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0092, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0104, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0250, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0146, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0093, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0189, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0131, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0100, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0149, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0114, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0088, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0170, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0124, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0189, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0104, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0158, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0095, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0130, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0209, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0164, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0115, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0115, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0185, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0168, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0087, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0108, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0257, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0109, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0085, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0093, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0174, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0086, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0098, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0140, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0149, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0113, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0109, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0146, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0131, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0120, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0190, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0157, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0110, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0202, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0118, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0100, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0103, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0155, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0161, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0108, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0185, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0111, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0106, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0174, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0155, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0234, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0080, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0188, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0206, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0147, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0139, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0183, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0119, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0093, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0144, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0086, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0096, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0110, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0167, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0141, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0150, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0181, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0087, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0220, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0181, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0139, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0120, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0168, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0098, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0177, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0104, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0134, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0082, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0107, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0110, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0130, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0222, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0128, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0155, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0100, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0168, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0162, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0155, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0103, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0165, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0122, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0082, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0166, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0112, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0197, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0112, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0094, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0193, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0139, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0123, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0139, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0145, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0119, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0129, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0105, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0088, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0144, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0109, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0114, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0150, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0145, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0119, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0121, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0092, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0133, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0212, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0208, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0119, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0257, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0117, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0084, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0084, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0198, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0137, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0109, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0131, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0179, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0113, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0115, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0178, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0095, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0098, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0133, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0137, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0158, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0146, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0147, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0104, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0188, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0109, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0142, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0120, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0185, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0121, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0156, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0111, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0111, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0095, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0082, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0132, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0150, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0092, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0160, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0113, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0159, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0086, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0204, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0181, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0179, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0109, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0111, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0097, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0220, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0099, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0094, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0100, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0180, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0214, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0135, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0085, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0114, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0192, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0167, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0145, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0146, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0108, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0176, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0095, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0192, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0331, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0177, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0174, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0169, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0150, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0187, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0092, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0160, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0081, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0090, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0163, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0144, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0125, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0101, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0126, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0161, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0159, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0145, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0257, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0163, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0226, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0193, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0160, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0242, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0115, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0190, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0139, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0189, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0081, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0092, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0186, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0120, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0170, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0122, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0152, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0128, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0109, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0240, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0162, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0145, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0275, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0174, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0164, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0144, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0087, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0195, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0127, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0209, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0199, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0137, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0124, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0127, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0098, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0100, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0100, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0139, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0134, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0185, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0100, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0084, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0082, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0102, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0186, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0112, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0108, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0204, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0172, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0138, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0146, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0087, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0104, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0110, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0114, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0100, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0114, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0123, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0090, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0089, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0180, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0108, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0183, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0164, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0139, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0179, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0136, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0108, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0087, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0100, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0111, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0149, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0099, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0219, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0118, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0150, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0122, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0113, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0137, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0099, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0130, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0138, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0173, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0177, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0132, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0145, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0100, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0095, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0097, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0167, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0125, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0182, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0103, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0113, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0143, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0134, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0190, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0192, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0107, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0125, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0135, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0087, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0112, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0085, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0243, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0100, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0191, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0146, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0087, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0122, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0109, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0192, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0200, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0104, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0169, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0111, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0100, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0151, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0098, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0098, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0093, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0108, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0106, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0141, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0107, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0119, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0121, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0203, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0137, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0106, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0192, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0150, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0289, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0289, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0084, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0164, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0101, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0147, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0116, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0140, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0089, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0107, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0080, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0114, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0152, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0199, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0131, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0131, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0143, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0098, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0104, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0190, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0111, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0179, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0148, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0163, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0156, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0108, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0141, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0122, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0120, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0154, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0116, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0097, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0126, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0125, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0162, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0230, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0106, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0113, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0125, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0163, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0185, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0143, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0113, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0128, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0126, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0156, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0180, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0174, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0111, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0126, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0110, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0105, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0094, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0115, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0112, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0119, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0183, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0178, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0117, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0122, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0104, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0135, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0128, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0109, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0081, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0258, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0122, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0143, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0126, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0132, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0161, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0125, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0122, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0113, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0193, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0130, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0166, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0108, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0129, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0107, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0129, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0101, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0130, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0123, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0136, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0115, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0122, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0148, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0113, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0126, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0111, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0152, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0194, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0108, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0110, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0093, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0147, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0115, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0093, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0133, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0150, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0098, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0170, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0105, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0092, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0174, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0100, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0082, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0234, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0133, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0101, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0133, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0133, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0099, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0109, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0103, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0100, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0160, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0097, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0201, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0104, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0184, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0141, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0112, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0242, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0134, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0171, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0159, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0245, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0291, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0179, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0221, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0159, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0158, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0096, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0185, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0115, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0131, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0084, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0111, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0096, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0155, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0258, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0159, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0103, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0204, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0083, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0183, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0129, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0323, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0135, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0111, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0124, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0100, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0124, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0099, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0093, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0127, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0243, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0083, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0083, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0114, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0131, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0099, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0142, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0127, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0114, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0080, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0089, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0127, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0161, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0129, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0253, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0130, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0136, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0151, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0108, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0169, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0140, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0098, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0101, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0206, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0152, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0162, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0105, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0084, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0158, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0100, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0106, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0140, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0093, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0094, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0108, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0095, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0166, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0119, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0115, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0186, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0127, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0108, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0135, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0174, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0168, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0235, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0098, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0080, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0110, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0171, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0113, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0172, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0139, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0085, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0130, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0116, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0134, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0147, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0170, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0101, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0246, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0193, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0105, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0162, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0101, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0147, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0115, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0083, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0082, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0183, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0140, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0143, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0109, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0116, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0221, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0119, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0118, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0141, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0147, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0244, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0110, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0134, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0186, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0250, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0128, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0087, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0185, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0121, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0120, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0116, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0116, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0101, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0113, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0144, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0113, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0223, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0122, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0121, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0146, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0153, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0101, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0139, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0135, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0177, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0092, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0116, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0221, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0095, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0194, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0094, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0102, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0208, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0086, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0277, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0085, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0148, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0183, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0167, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0131, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0182, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0107, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0103, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0134, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0223, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0211, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0153, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0203, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0127, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0116, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0082, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0109, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0131, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0128, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0163, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0226, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0219, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0254, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0194, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0202, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0233, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0097, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0104, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0109, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0123, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0186, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0087, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0157, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0097, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0086, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0116, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0128, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0153, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0158, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0189, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0143, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0111, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0155, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0148, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0086, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0101, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0127, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0143, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0101, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0227, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0145, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0119, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0173, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0229, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0088, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0100, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0130, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0147, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0235, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0202, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0170, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0120, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0345, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0081, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0117, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0083, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0215, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0237, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0097, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0089, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0224, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0131, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0129, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0176, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0177, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0172, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0194, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0139, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0099, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0083, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0129, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0100, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0090, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0134, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0094, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0198, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0157, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0187, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0241, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0207, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0105, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0108, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0157, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0114, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0172, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0110, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0112, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0113, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0118, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0200, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0135, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0110, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0092, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0229, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0120, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0113, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0122, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0272, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0165, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0175, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0138, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0120, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0164, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0090, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0101, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0116, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0112, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0129, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0152, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0179, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0131, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0099, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0156, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0128, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0119, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0158, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0139, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0150, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0143, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0124, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0112, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0105, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0104, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0091, device='cuda:0', grad_fn=<NllLossBackward>)
Saving model at iteration: 20000
Loss: tensor(0.0258, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0184, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0112, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0116, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0141, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0252, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0148, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0145, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0125, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0121, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0164, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0081, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0102, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0096, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0114, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0124, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0164, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0215, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0097, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0097, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0157, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0119, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0107, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0121, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0201, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0181, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0085, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0107, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0113, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0154, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0154, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0143, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0134, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0174, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0270, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0143, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0137, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0092, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0087, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0214, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0123, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0195, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0099, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0130, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0092, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0092, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0115, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0160, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0110, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0085, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0134, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0144, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0104, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0174, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0126, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0239, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0166, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0150, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0117, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0151, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0133, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0104, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0099, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0150, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0126, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0177, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0203, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0089, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0112, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0156, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0149, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0119, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0177, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0082, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0086, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0165, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0105, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0186, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0120, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0099, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0117, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0129, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0089, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0131, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0080, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0108, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0106, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0089, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0127, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0141, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0098, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0112, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0140, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0115, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0167, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0139, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0161, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0110, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0114, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0166, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0177, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0167, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0117, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0141, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0152, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0085, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0168, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0157, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0173, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0190, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0155, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0157, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0137, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0113, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0115, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0165, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0113, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0080, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0162, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0135, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0119, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0143, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0162, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0172, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0189, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0131, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0156, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0196, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0138, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0161, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0113, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0115, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0098, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0115, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0088, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0083, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0164, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0153, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0149, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0150, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0156, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0223, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0150, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0098, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0082, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0150, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0169, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0187, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0110, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0108, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0128, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0219, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0113, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0137, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0227, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0236, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0116, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0093, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0082, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0119, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0147, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0132, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0098, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0129, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0132, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0081, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0155, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0144, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0137, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0129, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0114, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0091, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0100, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0134, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0133, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0158, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0108, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0132, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0179, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0107, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0082, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0173, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0139, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0091, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0125, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0264, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0156, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0086, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0145, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0136, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0136, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0104, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0149, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0087, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0135, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0180, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0244, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0132, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0202, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0180, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0116, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0082, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0115, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0085, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0160, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0164, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0100, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0132, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0122, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0159, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0152, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0160, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0150, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0116, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0130, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0130, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0163, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0156, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0098, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0201, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0147, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0109, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0082, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0122, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0145, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0178, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0120, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0117, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0084, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0097, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0094, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0214, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0081, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0080, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0133, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0104, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0129, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0148, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0103, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0153, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0138, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0161, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0107, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0105, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0101, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0129, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0154, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0114, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0165, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0120, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0167, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0177, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0112, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0123, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0229, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0080, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0144, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0127, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0177, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0121, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0159, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0101, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0102, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0156, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0092, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0140, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0137, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0145, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0152, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0143, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0140, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0136, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0206, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0115, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0100, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0106, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0082, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0148, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0134, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0137, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0162, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0169, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0123, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0153, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0190, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0122, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0153, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0152, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0146, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0123, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0173, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0188, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0186, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0153, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0099, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0114, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0212, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0139, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0125, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0138, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0170, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0119, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0168, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0168, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0149, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0085, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0117, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0118, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0107, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0102, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0107, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0132, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0126, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0142, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0226, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0170, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0149, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0110, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0234, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0114, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0133, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0169, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0193, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0085, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0136, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0118, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0141, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0157, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0211, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0243, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0100, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0194, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0269, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0155, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0122, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0170, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0151, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0131, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0111, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0126, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0134, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0114, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0090, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0147, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0085, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0101, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0149, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0090, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0086, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0132, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0142, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0137, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0099, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0159, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0156, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0129, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0167, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0193, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0196, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0149, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0118, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0162, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0105, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0138, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0169, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0129, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0188, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0164, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0163, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0110, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0095, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0194, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0101, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0109, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0125, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0299, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0111, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0129, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0086, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0173, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0185, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0204, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0111, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0137, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0106, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0122, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0085, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0104, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0109, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0133, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0126, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0171, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0204, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0167, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0119, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0132, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0111, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0191, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0115, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0105, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0097, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0111, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0240, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0207, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0149, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0147, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0094, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0159, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0091, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0139, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0193, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0143, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0100, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0100, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0128, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0131, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0090, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0118, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0123, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0329, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0206, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0131, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0100, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0187, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0113, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0099, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0117, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0229, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0101, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0095, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0204, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0136, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0158, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0108, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0098, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0082, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0106, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0121, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0163, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0090, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0112, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0138, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0171, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0117, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0181, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0131, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0113, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0145, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0110, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0195, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0136, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0097, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0100, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0133, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0120, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0171, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0144, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0147, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0087, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0222, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0273, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0156, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0106, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0099, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0161, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0270, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0203, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0135, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0085, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0166, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0185, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0163, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0089, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0096, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0122, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0160, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0206, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0088, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0130, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0117, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0145, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0093, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0173, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0169, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0166, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0141, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0123, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0122, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0130, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0170, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0175, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0106, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0165, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0113, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0100, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0152, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0123, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0117, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0193, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0089, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0129, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0182, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0178, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0151, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0120, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0176, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0213, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0158, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0115, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0113, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0085, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0149, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0134, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0149, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0094, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0143, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0118, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0094, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0082, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0109, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0164, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0135, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0124, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0096, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0104, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0086, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0147, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0130, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0175, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0111, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0135, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0158, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0088, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0145, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0201, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0205, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0115, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0123, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0080, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0102, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0164, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0187, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0118, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0152, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0092, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0098, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0136, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0096, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0138, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0151, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0081, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0116, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0188, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0095, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0090, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0111, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0222, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0086, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0081, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0134, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0173, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0212, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0103, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0156, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0150, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0177, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0178, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0125, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0083, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0149, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0084, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0110, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0157, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0135, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0126, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0193, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0125, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0148, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0161, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0144, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0140, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0122, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0260, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0095, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0160, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0166, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0105, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0110, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0129, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0124, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0087, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0260, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0135, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0177, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0183, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0108, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0155, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0155, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0193, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0156, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0189, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0155, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0108, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0143, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0109, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0157, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0249, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0186, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0133, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0102, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0154, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0140, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0134, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0127, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0091, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0127, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0098, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0209, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0081, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0109, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0107, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0086, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0119, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0119, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0105, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0186, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0112, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0099, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0090, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0153, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0155, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0183, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0087, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0123, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0124, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0196, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0175, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0140, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0150, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0117, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0266, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0103, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0142, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0104, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0157, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0146, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0099, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0189, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0132, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0120, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0103, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0138, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0166, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0108, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0129, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0145, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0134, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0107, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0154, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0113, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0129, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0118, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0105, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0125, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0105, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0161, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0161, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0147, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0090, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0168, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0189, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0197, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0125, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0146, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0168, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0169, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0097, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0116, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0161, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0092, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0107, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0139, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0109, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0137, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0132, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0084, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0114, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0162, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0136, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0128, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0125, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0160, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0104, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0221, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0126, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0101, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0233, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0145, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0218, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0088, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0179, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0240, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0137, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0133, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0101, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0080, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0129, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0138, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0141, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0162, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0195, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0094, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0154, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0096, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0213, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0122, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0115, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0131, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0099, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0081, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0103, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0140, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0085, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0125, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0125, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0105, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0174, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0284, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0179, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0135, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0160, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0152, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0140, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0114, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0088, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0088, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0098, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0176, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0118, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0119, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0146, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0171, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0084, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0123, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0201, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0247, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0175, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0193, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0214, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0099, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0091, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0167, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0139, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0094, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0114, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0100, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0125, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0131, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0130, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0092, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0140, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0158, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0143, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0190, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0190, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0145, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0099, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0092, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0096, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0145, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0161, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0110, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0113, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0249, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0181, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0127, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0179, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0121, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0193, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0137, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0099, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0106, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0102, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0124, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0149, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0167, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0164, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0092, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0088, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0116, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0127, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0159, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0108, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0112, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0173, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0118, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0113, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0164, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0135, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0124, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0128, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0140, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0210, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0193, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0111, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0092, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0210, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0098, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0157, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0111, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0163, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0121, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0205, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0222, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0090, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0092, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0103, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0198, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0152, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0113, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0151, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0096, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0155, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0096, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0188, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0087, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0138, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0152, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0086, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0154, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0086, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0132, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0128, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0123, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0158, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0163, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0219, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0109, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0202, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0130, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0111, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0167, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0148, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0089, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0083, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0093, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0102, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0087, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0213, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0158, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0121, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0205, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0127, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0194, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0120, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0088, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0187, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0117, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0115, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0086, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0131, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0149, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0129, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0108, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0134, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0134, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0189, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0152, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0113, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0082, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0094, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0103, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0146, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0144, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0138, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0101, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0158, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0113, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0180, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0192, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0165, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0188, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0113, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0084, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0122, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0109, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0093, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0086, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0133, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0120, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0116, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0133, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0158, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0205, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0101, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0102, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0145, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0088, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0090, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0088, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0116, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0095, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0091, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0151, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0121, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0140, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0179, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0148, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0193, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0121, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0179, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0119, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0157, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0140, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0143, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0116, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0163, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0106, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0235, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0186, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0136, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0114, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0214, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0141, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0097, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0131, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0187, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0171, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0124, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0204, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0145, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0126, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0137, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0086, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0204, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0091, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0127, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0158, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0130, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0106, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0142, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0145, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0127, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0178, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0111, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0224, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0099, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0184, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0129, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0204, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0083, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0178, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0089, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0121, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0088, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0097, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0092, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0189, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0106, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0223, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0147, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0167, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0132, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0136, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0167, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0206, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0088, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0112, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0161, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0156, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0113, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0182, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0085, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0293, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0082, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0170, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0109, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0186, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0163, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0098, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0122, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0095, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0136, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0095, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0168, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0128, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0207, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0165, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0212, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0104, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0089, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0114, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0095, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0083, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0090, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0107, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0128, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0169, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0171, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0124, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0163, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0097, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0145, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0139, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0148, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0111, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0167, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0152, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0129, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0144, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0094, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0119, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0089, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0163, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0110, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0114, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0098, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0129, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0103, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0151, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0082, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0221, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0194, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0152, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0187, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0116, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0104, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0170, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0095, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0082, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0080, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0128, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0081, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0137, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0118, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0161, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0232, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0185, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0135, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0239, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0118, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0135, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0141, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0156, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0117, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0104, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0141, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0127, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0256, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0082, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0100, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0141, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0110, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0106, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0224, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0174, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0190, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0080, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0157, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0110, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0159, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0123, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0099, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0151, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0125, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0130, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0120, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0186, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0193, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0118, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0133, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0115, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0098, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0087, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0091, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0133, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0114, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0211, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0095, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0097, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0138, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0169, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0188, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0180, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0201, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0125, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0135, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0087, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0111, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0087, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0115, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0146, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0271, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0131, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0302, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0127, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0138, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0182, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0100, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0099, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0202, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0097, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0084, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0139, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0094, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0140, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0094, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0149, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0093, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0132, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0101, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0124, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0118, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0146, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0135, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0114, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0139, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0110, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0176, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0087, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0090, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0190, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0106, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0169, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0205, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0172, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0132, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0160, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0097, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0168, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0094, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0087, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0111, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0111, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0205, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0122, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0232, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0209, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0171, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0129, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0149, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0121, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0088, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0107, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0089, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0082, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0090, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0153, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0156, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0115, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0151, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0157, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0120, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0081, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0157, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0159, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0127, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0086, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0197, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0148, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0097, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0158, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0121, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0152, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0144, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0086, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0089, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0127, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0135, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0094, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0081, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0089, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0146, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0163, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0143, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0163, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0190, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0123, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0166, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0226, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0082, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0131, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0094, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0125, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0085, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0118, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0161, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0134, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0118, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0201, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0104, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0144, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0145, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0196, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0259, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0101, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0101, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0123, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0128, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0094, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0090, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0083, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0084, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0146, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0152, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0109, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0163, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0123, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0116, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0212, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0202, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0131, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0112, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0108, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0190, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0100, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0186, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0211, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0102, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0159, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0108, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0147, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0129, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0231, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0240, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0144, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0099, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0135, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0121, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0146, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0105, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0107, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0109, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0117, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0242, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0117, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0173, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0184, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0174, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0105, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0215, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0136, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0100, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0112, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0198, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0114, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0110, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0243, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0181, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0096, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0086, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0184, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0158, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0109, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0129, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0117, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0209, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0123, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0134, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0084, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0109, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0110, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0080, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0114, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0121, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0261, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0108, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0169, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0222, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0103, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0139, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0163, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0118, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0155, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0117, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0101, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0113, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0162, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0115, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0195, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0086, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0193, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0202, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0084, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0203, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0134, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0206, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0166, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0115, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0144, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0232, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0139, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0226, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0093, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0104, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0121, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0144, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0099, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0138, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0113, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0109, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0085, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0115, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0085, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0128, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0235, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0277, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0155, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0224, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0178, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0141, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0131, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0094, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0134, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0087, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0155, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0098, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0141, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0084, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0080, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0187, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0161, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0135, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0132, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0085, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0089, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0080, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0152, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0086, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0163, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0127, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0096, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0099, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0200, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0101, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0196, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0081, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0103, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0011, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0105, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0099, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0118, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0111, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0132, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0090, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0116, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0187, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0200, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0199, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0251, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0173, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0224, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0089, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0133, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0090, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0146, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0101, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0119, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0111, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0146, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0158, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0103, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0166, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0219, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0118, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0134, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0204, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0144, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0205, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0089, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0137, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0143, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0164, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0211, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0128, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0189, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0094, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0157, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0147, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0085, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0089, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0109, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0118, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0154, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0137, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0089, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0136, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0122, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0146, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0152, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0099, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0205, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0116, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0112, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0246, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0080, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0103, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0094, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0122, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0095, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0149, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0101, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0118, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0119, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0102, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0203, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0150, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0090, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0122, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0157, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0081, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0149, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0084, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0103, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0087, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0106, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0120, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0148, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0095, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0170, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0112, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0083, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0137, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0122, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0083, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0092, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0253, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0159, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0102, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0125, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0119, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0166, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0084, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0106, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0133, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0125, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0195, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0121, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0128, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0136, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0133, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0268, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0184, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0123, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0119, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0106, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0104, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0148, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0127, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0106, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0149, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0154, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0157, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0134, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0143, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0145, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0220, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0112, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0085, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0179, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0090, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0108, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0135, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0146, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0117, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0095, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0099, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0104, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0102, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0196, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0101, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0216, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0176, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0127, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0131, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0096, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0090, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0137, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0171, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0127, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0237, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0171, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0139, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0153, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0094, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0186, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0181, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0088, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0098, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0106, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0157, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0101, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0086, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0135, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0083, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0155, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0134, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0147, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0162, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0201, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0125, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0111, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0107, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0128, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0167, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0176, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0144, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0134, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0199, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0098, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0080, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0152, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0213, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0169, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0158, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0112, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0095, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0199, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0088, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0139, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0109, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0121, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0158, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0169, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0118, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0123, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0108, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0219, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0133, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0191, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0128, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0121, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0106, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0158, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0110, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0097, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0184, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0187, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0127, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0175, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0110, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0142, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0092, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0085, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0089, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0111, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0135, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0138, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0100, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0120, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0106, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0239, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0108, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0099, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0125, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0260, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0107, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0096, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0128, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0112, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0107, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0095, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0009, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0085, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0156, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0115, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0128, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0173, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0155, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0244, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0143, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0127, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0083, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0105, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0121, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0135, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0133, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0127, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0149, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0133, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0120, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0189, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0164, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0099, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0215, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0113, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0113, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0090, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0112, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0132, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0158, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0092, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0168, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0151, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0146, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0120, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0235, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0089, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0100, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0098, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0090, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0115, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0120, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0101, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0116, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0101, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0159, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0091, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0153, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0169, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0160, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0087, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0155, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0088, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0116, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0117, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0109, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0084, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0138, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0141, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0253, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0156, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0081, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0179, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0246, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0201, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0089, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0108, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0133, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0092, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0189, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0119, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0168, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0135, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0146, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0139, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0089, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0237, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0086, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0123, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0092, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0111, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0128, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0085, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0156, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0170, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0100, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0114, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0262, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0132, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0104, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0115, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0103, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0086, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0104, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0102, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0148, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0142, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0128, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0159, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0138, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0080, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0175, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0163, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0248, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0167, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0113, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0181, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0102, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0109, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0108, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0126, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0121, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0121, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0159, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0205, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0090, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0105, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0182, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0145, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0093, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0089, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0098, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0139, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0100, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0086, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0171, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0150, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0116, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0110, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0126, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0117, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0103, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0130, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0164, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0121, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0126, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0133, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0101, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0124, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0150, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0126, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0186, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0116, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0081, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0116, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0180, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0083, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0145, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0170, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0103, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0125, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0097, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0092, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0107, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0147, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0089, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0128, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0087, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0086, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0129, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0114, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0140, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0082, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0082, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0128, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0093, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0118, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0221, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0171, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0169, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0120, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0105, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0194, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0132, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0197, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0121, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0181, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0148, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0090, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0171, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0176, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0185, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0169, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0117, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0114, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0257, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0099, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0136, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0213, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0115, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0137, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0153, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0142, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0129, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0169, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0084, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0080, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0081, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0145, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0129, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0119, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0104, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0137, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0128, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0114, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0122, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0089, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0189, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0087, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0099, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0105, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0109, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0150, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0181, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0233, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0171, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0093, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0205, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0192, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0100, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0099, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0104, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0123, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0156, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0111, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0199, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0120, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0185, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0178, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0111, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0132, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0205, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0112, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0148, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0171, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0112, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0099, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0105, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0112, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0083, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0121, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0099, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0122, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0117, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0100, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0201, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0128, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0188, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0130, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0103, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0113, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0156, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0171, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0127, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0122, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0088, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0133, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0140, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0208, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0189, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0146, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0146, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0198, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0107, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0131, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0156, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0169, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0130, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0188, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0135, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0151, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0186, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0180, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0133, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0134, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0142, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0149, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0161, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0132, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0113, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0108, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0102, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0109, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0082, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0109, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0217, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0115, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0119, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0132, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0247, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0142, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0190, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0185, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0110, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0114, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0199, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0128, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0081, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0088, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0166, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0083, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0137, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0088, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0159, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0141, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0153, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0092, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0085, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0111, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0133, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0104, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0120, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0133, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0201, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0132, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0137, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0162, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0158, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0118, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0349, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0183, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0194, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0120, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0085, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0099, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0139, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0085, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0096, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0116, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0117, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0123, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0156, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0175, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0156, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0117, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0084, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0152, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0095, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0104, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0110, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0132, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0167, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0165, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0165, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0160, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0135, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0158, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0095, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0135, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0085, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0121, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0132, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0235, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0170, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0121, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0128, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0218, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0135, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0122, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0212, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0173, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0103, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0127, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0143, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0103, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0124, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0175, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0117, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0120, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0132, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0104, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0093, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0102, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0194, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0096, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0095, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0107, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0213, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0191, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0122, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0222, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0130, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0116, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0118, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0118, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0133, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0123, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0143, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0119, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0139, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0081, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0114, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0089, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0083, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0096, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0221, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0140, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0200, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0145, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0099, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0111, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0150, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0191, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0162, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0081, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0190, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0081, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0083, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0089, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0137, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0200, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0105, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0094, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0127, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0162, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0102, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0085, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0117, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0177, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0104, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0212, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0183, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0081, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0138, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0085, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0082, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0094, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0120, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0107, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0129, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0144, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0189, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0146, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0101, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0113, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0118, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0124, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0100, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0087, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0087, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0099, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0084, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0124, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0106, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0195, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0181, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0201, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0144, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0136, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0081, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0143, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0085, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0099, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0128, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0145, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0094, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0117, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0114, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0080, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0193, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0200, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0115, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0190, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0097, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0136, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0157, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0136, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0137, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0152, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0142, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0167, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0131, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0155, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0126, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0149, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0109, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0176, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0135, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0209, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0097, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0084, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0108, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0127, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0114, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0148, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0099, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0122, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0202, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0134, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0127, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0159, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0091, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0149, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0095, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0140, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0276, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0141, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0119, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0177, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0244, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0179, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0229, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0142, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0081, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0109, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0084, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0084, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0173, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0196, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0183, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0149, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0092, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0177, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0107, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0276, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0157, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0114, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0114, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0130, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0117, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0169, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0188, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0212, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0105, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0096, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0094, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0171, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0154, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0122, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0223, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0140, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0142, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0090, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0169, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0127, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0150, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0083, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0159, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0141, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0105, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0111, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0110, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0099, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0151, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0220, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0172, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0165, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0101, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0128, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0165, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0123, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0110, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0122, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0137, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0085, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0153, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0099, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0120, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0183, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0196, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0126, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0258, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0296, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0170, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0080, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0131, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0131, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0104, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0098, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0122, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0100, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0080, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0125, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0174, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0136, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0134, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0104, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0088, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0107, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0132, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0124, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0108, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0115, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0146, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0132, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0108, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0144, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0186, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0111, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0107, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0180, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0089, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0120, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0093, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0160, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0168, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0113, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0105, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0128, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0134, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0174, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0237, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0121, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0209, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0110, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0096, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0127, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0129, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0119, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0084, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0088, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0155, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0110, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0105, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0121, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0143, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0146, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0241, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0135, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0218, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0093, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0188, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0166, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0112, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0132, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0110, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0134, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0113, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0117, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0112, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0163, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0088, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0145, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0104, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0128, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0112, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0089, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0141, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0087, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0153, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0107, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0185, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0139, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0202, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0170, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0174, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0162, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0083, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0203, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0103, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0119, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0112, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0172, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0162, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0093, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0117, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0087, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0093, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0081, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0172, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0188, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0167, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0112, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0095, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0120, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0085, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0179, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0108, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0125, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0119, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0149, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0139, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0200, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0164, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0085, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0161, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0088, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0136, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0138, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0094, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0095, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0093, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0104, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0120, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0225, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0145, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0105, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0195, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0125, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0102, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0144, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0089, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0161, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0082, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0124, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0111, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0276, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0157, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0083, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0092, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0130, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0136, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0089, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0098, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0085, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0094, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0141, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0096, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0254, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0144, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0109, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0143, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0219, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0117, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0088, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0159, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0097, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0092, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0085, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0122, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0128, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0156, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0099, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0139, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0142, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0193, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0159, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0284, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0083, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0122, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0098, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0148, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0107, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0132, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0084, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0167, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0138, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0187, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0161, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0113, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0096, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0105, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0093, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0167, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0151, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0257, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0147, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0126, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0100, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0124, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0105, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0122, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0147, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0083, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0106, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0120, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0129, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0105, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0127, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0111, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0149, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0172, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0116, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0140, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0085, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0138, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0090, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0135, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0116, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0139, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0145, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0221, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0107, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0106, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0146, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0132, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0135, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0140, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0158, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0124, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0156, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0153, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0099, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0131, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0156, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0189, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0107, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0178, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0213, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0098, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0127, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0217, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0154, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0118, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0145, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0118, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0110, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0106, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0127, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0155, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0138, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0162, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0202, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0092, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0126, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0111, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0112, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0117, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0118, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0103, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0146, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0218, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0091, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0197, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0087, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0156, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0118, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0123, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0105, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0123, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0132, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0101, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0125, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0147, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0219, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0147, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0133, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0121, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0109, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0115, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0120, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0104, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0124, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0139, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0096, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0192, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0106, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0248, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0157, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0184, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0110, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0122, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0124, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0157, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0089, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0115, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0090, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0163, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0160, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0099, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0120, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0109, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0148, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0172, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0199, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0081, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0112, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0127, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0087, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0146, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0127, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0080, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0122, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0128, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0130, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0085, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0183, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0225, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0106, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0087, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0155, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0099, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0185, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0139, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0153, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0145, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0151, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0158, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0133, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0261, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0081, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0150, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0137, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0193, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0114, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0184, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0152, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0094, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0107, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0111, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0101, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0141, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0154, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0152, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0162, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0088, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0118, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0125, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0094, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0141, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0167, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0171, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0087, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0144, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0149, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0101, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0158, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0096, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0180, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0143, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0191, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0099, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0198, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0190, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0103, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0132, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0149, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0135, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0087, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0116, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0095, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0097, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0088, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0114, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0110, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0137, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0107, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0120, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0166, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0115, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0245, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0181, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0093, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0111, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0114, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0158, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0123, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0084, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0159, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0151, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0113, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0156, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0103, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0115, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0162, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0196, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0147, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0086, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0115, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0106, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0087, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0080, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0104, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0086, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0178, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0110, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0109, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0156, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0197, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0113, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0149, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0178, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0139, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0119, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0087, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0106, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0146, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0131, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0174, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0093, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0195, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0157, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0113, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0142, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0108, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0170, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0103, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0183, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0096, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0120, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0093, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0100, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0102, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0096, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0119, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0151, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0148, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0134, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0165, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0094, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0235, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0121, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0184, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0092, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0152, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0108, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0092, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0162, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0112, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0155, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0153, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0123, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0090, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0114, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0096, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0106, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0083, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0081, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0107, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0102, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0187, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0121, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0159, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0149, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0245, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0157, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0119, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0154, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0094, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0082, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0097, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0130, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0082, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0166, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0094, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0214, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0156, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0227, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0100, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0104, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0084, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0160, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0142, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0112, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0127, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0198, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0114, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0216, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0200, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0189, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0210, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0157, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0122, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0100, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0152, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0088, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0174, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0145, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0089, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0187, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0169, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0100, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0108, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0082, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0110, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0124, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0145, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0100, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0169, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0102, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0112, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0101, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0149, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0215, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0104, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0212, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0138, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0120, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0093, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0130, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0092, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0146, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0124, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0124, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0137, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0170, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0186, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0235, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0156, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0089, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0080, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0160, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0128, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0115, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0170, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0107, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0136, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0207, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0110, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0125, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0155, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0173, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0124, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0096, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0093, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0141, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0146, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0163, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0179, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0139, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0113, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0356, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0092, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0154, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0142, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0162, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0143, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0103, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0092, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0104, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0139, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0142, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0100, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0155, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0103, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0194, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0160, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0193, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0131, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0087, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0105, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0156, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0122, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0142, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0108, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0140, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0126, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0144, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0096, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0130, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0136, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0178, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0177, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0158, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0135, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0132, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0218, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0142, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0115, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0210, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0161, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0171, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0223, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0110, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0123, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0197, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0169, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0131, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0096, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0099, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0104, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0083, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0120, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0156, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0124, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0088, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0123, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0142, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0145, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0129, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0114, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0220, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0154, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0120, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0182, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0173, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0122, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0085, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0165, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0125, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0101, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0186, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0093, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0155, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0129, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0102, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0124, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0111, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0133, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0140, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0165, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0160, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0106, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0141, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0125, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0123, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0085, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0128, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0111, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0135, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0101, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0127, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0188, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0144, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0241, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0090, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0229, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0158, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0087, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0117, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0121, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0097, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0148, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0106, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0139, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0187, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0222, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0199, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0091, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0170, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0108, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0130, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0116, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0108, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0216, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0156, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0098, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0137, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0125, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0130, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0104, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0121, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0181, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0092, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0137, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0182, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0203, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0144, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0153, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0148, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0105, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0140, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0218, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0126, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0155, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0092, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0161, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0094, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0273, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0188, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0081, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0089, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0107, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0159, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0156, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0121, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0111, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0113, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0149, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0182, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0135, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0143, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0103, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0141, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0150, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0084, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0181, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0086, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0120, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0181, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0095, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0087, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0101, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0093, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0159, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0162, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0116, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0154, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0152, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0175, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0090, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0155, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0142, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0129, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0080, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0132, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0106, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0166, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0118, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0191, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0120, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0135, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0175, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0201, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0085, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0127, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0097, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0164, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0154, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0121, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0116, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0153, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0150, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0104, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0245, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0173, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0168, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0155, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0131, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0091, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0097, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0123, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0107, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0171, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0144, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0182, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0151, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0150, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0119, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0115, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0135, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0133, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0102, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0094, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0148, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0120, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0105, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0104, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0105, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0108, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0092, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0222, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0142, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0129, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0252, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0105, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0148, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0093, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0199, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0173, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0113, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0115, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0142, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0191, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0098, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0131, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0119, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0196, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0138, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0136, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0132, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0175, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0094, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0110, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0150, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0114, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0115, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0086, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0100, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0133, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0194, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0151, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0160, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0172, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0113, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0097, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0121, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0134, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0186, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0106, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0089, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0225, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0147, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0174, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0233, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0130, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0169, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0180, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0164, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0103, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0085, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0118, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0103, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0102, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0209, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0127, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0102, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0097, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0233, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0208, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0169, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0122, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0149, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0105, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0096, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0103, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0106, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0205, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0120, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0236, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0084, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0192, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0133, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0141, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0171, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0153, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0112, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0112, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0110, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0117, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0112, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0165, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0164, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0107, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0277, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0132, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0206, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0189, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0112, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0109, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0175, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0098, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0100, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0164, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0108, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0136, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0128, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0124, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0158, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0100, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0116, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0145, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0299, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0164, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0110, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0122, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0141, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0106, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0102, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0084, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0182, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0200, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0127, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0233, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0083, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0134, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0083, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0084, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0106, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0083, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0081, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0108, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0197, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0106, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0091, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0177, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0106, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0178, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0148, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0097, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0152, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0100, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0103, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0119, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0089, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0137, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0116, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0194, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0162, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0257, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0100, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0123, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0135, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0137, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0092, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0162, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0093, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0140, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0130, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0114, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0200, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0171, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0185, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0152, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0165, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0152, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0109, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0129, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0125, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0205, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0088, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0101, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0081, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0090, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0149, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0172, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0104, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0172, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0132, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0133, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0137, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0119, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0108, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0136, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0207, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0168, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0204, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0110, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0128, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0106, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0194, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0170, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0148, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0116, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0101, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0083, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0127, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0099, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0199, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0132, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0130, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0166, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0216, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0227, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0139, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0088, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0167, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0103, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0152, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0259, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0166, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0142, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0118, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0101, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0092, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0151, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0176, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0145, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0081, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0123, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0154, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0148, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0130, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0174, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0204, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0214, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0120, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0108, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0122, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0153, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0134, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0181, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0115, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0195, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0151, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0151, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0146, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0113, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0213, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0126, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0147, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0118, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0082, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0113, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0144, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0104, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0108, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0138, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0240, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0166, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0138, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0110, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0178, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0209, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0146, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0213, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0129, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0104, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0094, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0086, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0115, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0107, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0175, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0090, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0238, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0143, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0138, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0118, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0095, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0133, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0117, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0104, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0120, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0113, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0126, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0086, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0082, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0086, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0150, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0199, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0095, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0093, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0241, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0140, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0121, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0155, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0159, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0125, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0138, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0166, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0146, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0097, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0113, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0149, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0111, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0128, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0196, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0226, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0170, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0113, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0140, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0115, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0149, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0174, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0136, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0154, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0154, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0132, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0251, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0135, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0151, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0124, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0105, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0168, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0120, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0134, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0214, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0135, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0133, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0123, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0161, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0185, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0186, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0118, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0165, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0180, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0102, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0125, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0117, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0125, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0109, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0172, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0128, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0089, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0114, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0133, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0126, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0110, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0231, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0147, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0130, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0170, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0105, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0138, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0123, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0120, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0113, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0186, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0114, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0107, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0113, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0082, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0123, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0106, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0129, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0151, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0218, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0089, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0094, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0130, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0106, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0101, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0188, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0145, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0160, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0146, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0186, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0142, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0091, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0100, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0144, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0109, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0137, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0125, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0093, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0168, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0184, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0125, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0154, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0145, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0148, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0085, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0094, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0106, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0116, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0104, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0081, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0141, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0103, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0131, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0114, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0144, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0108, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0290, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0208, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0080, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0150, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0117, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0108, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0092, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0150, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0119, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0150, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0119, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0132, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0121, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0126, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0164, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0174, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0151, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0155, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0108, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0152, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0117, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0114, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0153, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0134, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0160, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0175, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0144, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0235, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0083, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0123, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0106, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0127, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0100, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0082, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0147, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0080, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0122, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0156, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0201, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0131, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0173, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0148, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0161, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0151, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0102, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0102, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0139, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0093, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0123, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0155, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0122, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0139, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0162, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0195, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0182, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0161, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0126, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0172, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0112, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0202, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0155, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0151, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0209, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0105, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0180, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0132, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0100, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0141, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0124, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0100, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0129, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0084, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0128, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0084, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0163, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0133, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0136, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0172, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0133, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0153, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0097, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0129, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0138, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0092, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0155, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0165, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0111, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0095, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0184, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0143, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0102, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0129, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0145, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0090, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0151, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0200, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0142, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0240, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0123, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0142, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0097, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0123, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0084, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0095, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0134, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0116, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0225, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0099, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0148, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0109, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0104, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0188, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0131, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0154, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0158, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0115, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0087, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0100, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0138, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0221, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0111, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0160, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0160, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0162, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0201, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0181, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0128, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0089, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0144, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0099, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0089, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0120, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0123, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0102, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0184, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0148, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0134, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0239, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0233, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0124, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0132, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0110, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0155, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0147, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0095, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0211, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0120, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0123, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0152, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0107, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0131, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0152, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0087, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0168, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0080, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0114, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0199, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0112, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0183, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0264, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0129, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0110, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0094, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0142, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0110, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0185, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0146, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0210, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0207, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0083, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0115, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0163, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0101, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0101, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0127, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0095, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0141, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0114, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0126, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0149, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0171, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0135, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0131, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0114, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0089, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0102, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0114, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0113, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0179, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0179, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0249, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0138, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0202, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0170, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0089, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0186, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0113, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0105, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0130, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0114, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0096, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0141, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0147, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0109, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0152, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0082, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0112, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0125, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0120, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0136, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0180, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0205, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0195, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0167, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0187, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0087, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0176, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0085, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0165, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0097, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0181, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0161, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0132, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0107, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0150, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0155, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0155, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0113, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0153, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0102, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0138, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0103, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0215, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0125, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0149, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0107, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0138, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0115, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0107, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0148, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0217, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0106, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0117, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0084, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0290, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0122, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0117, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0101, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0117, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0130, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0125, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0110, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0132, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0154, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0289, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0187, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0103, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0117, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0164, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0097, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0133, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0183, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0188, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0091, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0165, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0113, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0132, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0166, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0111, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0089, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0093, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0093, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0111, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0117, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0175, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0115, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0083, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0240, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0182, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0191, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0207, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0214, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0184, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0130, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0151, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0143, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0181, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0153, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0091, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0114, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0110, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0157, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0193, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0098, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0138, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0200, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0149, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0158, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0087, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0156, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0109, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0100, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0167, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0129, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0173, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0214, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0177, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0139, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0106, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0181, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0099, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0127, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0136, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0119, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0242, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0119, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0221, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0119, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0115, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0131, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0137, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0158, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0144, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0093, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0087, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0142, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0104, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0174, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0102, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0113, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0194, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0134, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0159, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0102, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0104, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0133, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0096, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0083, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0151, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0229, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0134, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0174, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0140, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0187, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0138, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0151, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0134, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0145, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0176, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0095, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0144, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0110, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0116, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0181, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0121, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0131, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0167, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0127, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0135, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0146, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0105, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0150, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0103, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0135, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0158, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0226, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0201, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0132, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0200, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0086, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0199, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0114, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0164, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0152, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0087, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0131, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0128, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0141, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0209, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0124, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0168, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0114, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0144, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0152, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0147, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0156, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0208, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0129, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0124, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0085, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0200, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0206, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0117, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0157, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0148, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0150, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0125, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0125, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0130, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0083, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0123, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0259, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0139, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0264, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0121, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0128, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0140, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0109, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0080, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0121, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0115, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0158, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0214, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0118, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0151, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0151, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0145, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0101, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0158, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0133, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0131, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0124, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0125, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0180, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0134, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0124, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0172, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0120, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0119, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0177, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0196, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0196, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0107, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0196, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0081, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0113, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0093, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0099, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0184, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0173, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0187, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0120, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0174, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0126, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0183, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0111, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0184, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0094, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0091, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0153, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0098, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0142, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0127, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0189, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0111, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0242, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0235, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0092, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0191, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0096, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0135, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0124, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0092, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0124, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0086, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0187, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0093, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0093, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0102, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0141, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0093, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0147, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0125, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0173, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0081, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0116, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0121, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0154, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0105, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0189, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0148, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0090, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0154, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0110, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0165, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0173, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0156, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0150, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0082, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0108, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0106, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0086, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0115, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0112, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0158, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0118, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0128, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0181, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0096, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0121, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0122, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0105, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0087, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0153, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0109, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0205, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0156, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0198, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0160, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0165, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0196, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0113, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0109, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0112, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0093, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0140, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0131, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0180, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0172, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0092, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0150, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0101, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0096, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0172, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0138, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0138, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0096, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0162, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0135, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0248, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0152, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0116, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0101, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0125, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0169, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0145, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0182, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0133, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0138, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0187, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0140, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0114, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0155, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0081, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0169, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0120, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0179, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0121, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0095, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0087, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0181, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0124, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0091, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0118, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0200, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0228, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0210, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0177, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0158, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0143, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0107, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0096, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0081, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0143, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0205, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0158, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0149, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0156, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0121, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0222, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0122, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0090, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0163, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0122, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0146, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0257, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0089, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0213, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0107, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0188, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0101, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0127, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0177, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0124, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0095, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0111, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0115, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0215, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0158, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0165, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0171, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0137, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0188, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0154, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0100, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0159, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0138, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0164, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0104, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0115, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0137, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0126, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0109, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0109, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0189, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0177, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0120, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0147, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0209, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0130, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0135, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0171, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0139, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0200, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0150, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0152, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0154, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0144, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0121, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0099, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0093, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0111, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0177, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0184, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0175, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0153, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0162, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0129, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0084, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0127, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0162, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0185, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0095, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0194, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0190, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0132, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0108, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0190, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0109, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0137, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0130, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0139, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0091, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0215, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0137, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0160, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0150, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0208, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0155, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0087, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0103, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0148, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0131, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0098, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0134, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0090, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0132, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0114, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0182, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0161, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0128, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0139, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0159, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0186, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0167, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0189, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0175, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0165, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0184, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0184, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0146, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0086, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0111, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0165, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0166, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0179, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0137, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0158, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0197, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0142, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0094, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0119, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0090, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0154, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0101, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0163, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0161, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0126, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0185, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0209, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0158, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0151, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0116, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0153, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0140, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0167, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0110, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0135, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0204, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0176, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0161, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0116, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0111, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0229, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0121, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0136, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0166, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0092, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0160, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0162, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0139, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0111, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0148, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0089, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0216, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0092, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0098, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0125, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0081, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0161, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0144, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0089, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0146, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0100, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0149, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0103, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0131, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0121, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0137, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0171, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0238, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0156, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0116, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0234, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0286, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0133, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0101, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0107, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0090, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0237, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0194, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0154, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0156, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0179, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0176, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0155, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0187, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0090, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0140, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0157, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0236, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0226, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0199, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0176, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0161, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0170, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0132, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0125, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0125, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0119, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0122, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0142, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0131, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0169, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0109, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0142, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0095, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0158, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0136, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0134, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0125, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0211, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0115, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0161, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0170, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0125, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0121, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0142, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0179, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0118, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0140, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0122, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0119, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0155, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0097, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0201, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0093, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0154, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0201, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0081, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0138, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0155, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0244, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0087, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0179, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0177, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0114, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0169, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0200, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0121, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0176, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0096, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0134, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0109, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0138, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0121, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0103, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0134, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0264, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0114, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0133, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0113, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0110, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0150, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0198, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0138, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0117, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0175, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0200, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0194, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0177, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0157, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0101, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0129, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0085, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0156, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0135, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0108, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0159, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0095, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0149, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0191, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0132, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0160, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0141, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0191, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0131, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0132, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0192, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0134, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0179, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0185, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0172, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0208, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0153, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0110, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0093, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0205, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0089, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0094, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0101, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0082, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0080, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0134, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0133, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0113, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0100, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0093, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0080, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0170, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0096, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0167, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0087, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0152, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0116, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0111, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0084, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0101, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0127, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0152, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0139, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0116, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0097, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0103, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0119, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0086, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0086, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0114, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0091, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0185, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0108, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0282, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0142, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0147, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0081, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0092, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0106, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0124, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0156, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0103, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0144, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0099, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0148, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0094, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0100, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0239, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0145, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0105, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0083, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0117, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0191, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0170, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0106, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0186, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0104, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0098, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0093, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0128, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0168, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0094, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0120, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0143, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0122, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0156, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0095, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0097, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0169, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0103, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0157, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0133, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0082, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0088, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0117, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0111, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0139, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0138, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0164, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0100, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0197, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0105, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0123, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0112, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0139, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0142, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0126, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0149, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0142, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0082, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0181, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0102, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0126, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0094, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0115, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0080, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0141, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0127, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0188, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0124, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0100, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0188, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0157, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0195, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0103, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0129, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0100, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0113, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0151, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0105, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0089, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0115, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0099, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0122, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0121, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0195, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0119, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0080, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0107, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0101, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0108, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0101, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0200, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0135, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0144, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0180, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0134, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0115, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0093, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0126, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0099, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0166, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0118, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0157, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0122, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0107, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0104, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0118, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0115, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0087, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0171, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0140, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0167, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0162, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0134, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0144, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0110, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0128, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0123, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0140, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0175, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0216, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0168, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0122, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0095, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0170, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0157, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0123, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0111, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0141, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0138, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0108, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0204, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0145, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0138, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0112, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0132, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0093, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0090, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0117, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0151, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0123, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0155, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0128, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0094, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0131, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0224, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0112, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0150, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0144, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0142, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0109, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0135, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0127, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0102, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0183, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0190, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0107, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0085, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0115, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0084, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0200, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0116, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0099, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0111, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0134, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0142, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0153, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0163, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0093, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0088, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0129, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0091, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0154, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0174, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0130, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0085, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0216, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0119, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0109, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0139, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0111, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0098, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0108, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0096, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0088, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0105, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0098, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0126, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0120, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0080, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0088, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0090, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0149, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0104, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0109, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0137, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0101, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0131, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0162, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0123, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0116, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0123, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0097, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0213, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0127, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0103, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0123, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0090, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0125, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0122, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0190, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0093, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0106, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0160, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0092, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0183, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0126, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0083, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0183, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0152, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0092, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0154, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0091, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0152, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0139, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0090, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0110, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0177, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0093, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0121, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0151, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0117, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0109, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0112, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0087, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0093, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0152, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0117, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0179, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0131, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0112, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0110, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0126, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0204, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0096, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0111, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0163, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0106, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0108, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0126, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0116, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0126, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0185, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0106, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0088, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0119, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0096, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0095, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0141, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0143, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0085, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0160, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0092, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0080, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0162, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0105, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0102, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0109, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0136, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0154, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0119, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0099, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0107, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0089, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0125, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0157, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0107, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0158, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0081, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0113, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0197, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0162, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0121, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0160, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0120, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0083, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0080, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0153, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0110, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0107, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0105, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0133, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0112, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0157, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0129, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0087, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0091, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0119, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0089, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0153, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0142, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0091, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0135, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0093, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0096, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0103, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0128, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0090, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0206, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0116, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0194, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0094, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0109, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0083, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0141, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0116, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0161, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0145, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0122, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0215, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0101, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0088, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0096, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0095, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0110, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0126, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0135, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0093, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0170, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0124, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0145, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0090, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0125, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0174, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0089, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0168, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0186, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0095, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0110, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0130, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0107, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0094, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0134, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0157, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0134, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0089, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0200, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0145, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0137, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0093, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0134, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0171, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0166, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0127, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0125, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0106, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0084, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0101, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0084, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0101, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0101, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0094, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0085, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0135, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0103, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0089, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0113, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0092, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0106, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0104, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0091, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0106, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0148, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0102, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0113, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0116, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0143, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0082, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0133, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0119, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0121, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0149, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0154, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0109, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0173, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0108, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0087, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0105, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0106, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0094, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0139, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0157, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0094, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0115, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0091, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0177, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0083, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0118, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0127, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0108, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0102, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0141, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0125, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0145, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0105, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0135, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0090, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0164, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0080, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0137, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0108, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0103, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0114, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0126, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0104, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0127, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0082, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0130, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0090, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0090, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0092, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0095, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0107, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0134, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0113, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0101, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0084, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0131, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0089, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0133, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0083, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0103, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0091, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0139, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0196, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0102, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0135, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0100, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0088, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0080, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0181, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0181, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0193, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0117, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0082, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0091, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0190, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0094, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0095, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0083, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0176, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0111, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0084, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0085, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0178, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0127, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0131, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0122, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0116, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0173, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0106, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0120, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0100, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0147, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0194, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0091, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0089, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0088, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0115, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0120, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0132, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0150, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0178, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0111, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0081, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0106, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0102, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0121, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0100, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0107, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0114, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0197, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0099, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0120, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0117, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0086, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0112, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0105, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0154, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0114, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0118, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0083, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0114, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0091, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0101, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0161, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0150, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0092, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0148, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0234, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0106, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0124, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0108, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0174, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0112, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0103, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0162, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0108, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0087, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0125, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0160, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0121, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0128, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0085, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0102, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0082, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0146, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0138, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0150, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0113, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0200, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0171, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0125, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0086, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0104, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0143, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0117, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0115, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0091, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0138, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0093, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0136, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0150, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0094, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0088, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0106, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0087, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0164, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0108, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0155, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0080, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0122, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0097, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0109, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0113, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0150, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0094, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0099, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0109, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0209, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0114, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0122, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0122, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0119, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0160, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0080, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0081, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0126, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0096, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0196, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0126, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0133, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0151, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0108, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0113, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0123, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0114, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0095, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0099, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0110, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0117, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0114, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0102, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0123, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0106, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0099, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0116, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0099, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0090, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0097, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0136, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0106, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0243, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0084, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0102, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0195, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0126, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0115, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0111, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0088, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0081, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0084, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0172, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0149, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0094, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0118, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0085, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0092, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0145, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0097, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0112, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0114, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0088, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0113, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0116, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0142, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0089, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0157, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0130, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0143, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0100, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0104, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0101, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0084, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0157, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0122, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0159, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0092, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0080, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0193, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0117, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0113, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0121, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0159, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0080, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0122, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0122, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0112, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0124, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0159, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0108, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0112, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0200, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0129, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0100, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0095, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0164, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0138, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0109, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0107, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0170, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0201, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0115, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0094, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0095, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0130, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0138, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0085, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0081, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0122, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0143, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0127, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0104, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0088, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0102, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0143, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0157, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0139, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0102, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0090, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0103, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0096, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0163, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0104, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0125, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0168, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0085, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0090, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0098, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0167, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0093, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0117, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0090, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0125, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0198, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0187, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0104, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0093, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0089, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0110, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0089, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0101, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0144, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0140, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0154, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0140, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0110, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0128, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0081, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0113, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0190, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0181, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0105, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0095, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0085, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0198, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0155, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0143, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0108, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0120, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0084, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0099, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0121, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0144, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0112, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0102, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0156, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0082, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0096, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0135, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0122, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0118, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0186, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0098, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0155, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0247, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0094, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0134, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0136, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0136, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0142, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0138, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0157, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0109, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0121, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0089, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0089, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0083, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0129, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0138, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0146, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0118, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0193, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0136, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0103, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0087, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0097, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0163, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0113, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0137, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0096, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0098, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0094, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0138, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0164, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0095, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0113, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0098, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0150, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0184, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0098, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0128, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0082, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0160, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0098, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0109, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0149, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0136, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0117, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0251, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0098, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0117, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0114, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0105, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0166, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0083, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0165, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0128, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0100, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0182, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0146, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0162, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0101, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0124, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0100, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0116, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0161, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0110, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0134, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0181, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0091, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0115, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0118, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0134, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0142, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0126, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0186, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0129, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0088, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0146, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0133, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0187, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0126, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0080, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0113, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0113, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0106, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0082, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0100, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0164, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0102, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0204, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0115, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0103, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0092, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0152, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0174, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0125, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0136, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0114, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0094, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0171, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0118, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0102, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0155, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0174, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0130, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0129, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0086, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0110, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0081, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0114, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0104, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0134, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0102, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0096, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0098, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0179, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0182, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0114, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0093, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0088, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0227, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0093, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0117, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0121, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0145, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0109, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0116, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0112, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0181, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0170, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0089, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0096, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0083, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0113, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0122, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0082, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0123, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0128, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0106, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0186, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0117, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0134, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0095, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0080, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0169, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0193, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0085, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0098, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0098, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0185, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0110, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0194, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0091, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0120, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0153, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0123, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0103, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0082, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0173, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0092, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0092, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0094, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0167, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0148, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0087, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0171, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0115, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0219, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0083, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0092, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0138, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0085, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0133, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0099, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0091, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0225, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0188, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0166, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0149, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0084, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0110, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0122, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0097, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0204, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0097, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0128, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0132, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0155, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0146, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0102, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0088, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0107, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0116, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0100, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0153, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0151, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0191, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0194, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0119, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0152, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0114, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0082, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0107, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0252, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0088, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0085, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0172, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0130, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0150, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0083, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0089, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0167, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0116, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0109, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0101, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0098, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0143, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0087, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0108, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0101, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0111, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0105, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0106, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0141, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0138, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0140, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0169, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0115, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0109, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0089, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0121, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0157, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0118, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0120, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0082, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0091, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0099, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0095, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0086, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0085, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0130, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0155, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0132, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0102, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0144, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0109, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0185, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0169, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0126, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0145, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0098, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0180, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0130, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0112, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0115, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0148, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0186, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0084, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0095, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0081, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0082, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0120, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0166, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0151, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0082, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0125, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0163, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0130, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0112, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0137, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0091, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0121, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0114, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0101, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0105, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0097, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0224, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0087, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0203, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0149, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0130, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0150, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0104, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0105, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0128, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0094, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0215, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0112, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0105, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0091, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0089, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0169, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0107, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0206, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0119, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0180, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0134, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0092, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0158, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0095, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0134, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0082, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0095, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0117, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0122, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0145, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0080, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0092, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0152, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0083, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0149, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0151, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0147, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0206, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0196, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0107, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0135, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0080, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0096, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0149, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0154, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0142, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0115, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0161, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0130, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0107, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0086, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0124, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0086, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0092, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0095, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0104, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0090, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0114, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0151, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0093, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0155, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0081, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0091, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0128, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0102, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0089, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0118, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0118, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0100, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0109, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0087, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0093, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0098, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0154, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0115, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0122, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0088, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0211, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0092, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0116, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0139, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0082, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0130, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0119, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0093, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0152, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0132, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0200, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0177, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0169, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0117, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0084, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0161, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0103, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0163, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0086, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0129, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0130, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0105, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0137, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0131, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0106, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0088, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0154, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0085, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0105, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0165, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0089, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0110, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0131, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0080, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0181, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0115, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0137, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0143, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0187, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0195, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0092, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0159, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0166, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0234, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0152, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0127, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0138, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0097, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0112, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0125, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0134, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0107, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0117, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0117, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0112, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0177, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0127, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0118, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0127, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0098, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0125, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0113, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0135, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0200, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0199, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0140, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0124, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0103, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0168, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0093, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0106, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0096, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0171, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0162, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0116, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0086, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0086, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0106, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0131, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0159, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0198, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0121, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0119, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0103, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0120, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0104, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0094, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0139, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0159, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0146, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0109, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0176, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0117, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0119, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0104, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0213, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0137, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0098, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0164, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0113, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0095, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0108, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0131, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0150, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0097, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0082, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0104, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0128, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0258, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0136, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0097, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0155, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0086, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0146, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0099, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0125, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0103, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0126, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0128, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0098, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0112, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0169, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0166, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0171, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0195, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0111, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0126, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0117, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0110, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0124, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0096, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0089, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0161, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0146, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0104, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0120, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0140, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0136, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0082, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0166, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0180, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0119, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0144, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0169, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0127, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0125, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0139, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0111, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0101, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0142, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0197, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0125, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0151, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0082, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0092, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0085, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0110, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0125, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0133, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0094, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0086, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0171, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0105, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0217, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0094, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0264, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0085, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0095, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0157, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0152, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0175, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0207, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0153, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0178, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0082, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0119, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0149, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0081, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0147, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0110, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0128, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0097, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0107, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0107, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0086, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0217, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0134, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0198, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0111, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0115, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0134, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0094, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0105, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0127, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0146, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0109, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0089, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0139, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0104, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0160, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0081, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0101, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0085, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0136, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0138, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0121, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0151, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0106, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0087, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0102, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0180, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0135, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0161, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0163, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0089, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0108, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0161, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0115, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0101, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0097, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0119, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0103, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0090, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0187, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0203, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0097, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0164, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0109, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0133, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0102, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0126, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0153, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0154, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0122, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0084, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0179, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0090, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0143, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0136, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0088, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0146, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0104, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0184, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0084, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0115, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0129, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0149, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0150, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0164, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0107, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0147, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0119, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0146, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0133, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0154, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0109, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0110, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0175, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0109, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0129, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0130, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0123, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0105, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0144, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0240, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0150, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0114, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0097, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0186, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0124, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0144, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0156, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0132, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0114, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0100, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0139, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0118, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0152, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0177, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0139, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0086, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0099, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0169, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0122, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0101, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0172, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0127, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0108, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0090, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0154, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0109, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0108, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0204, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0118, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0082, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0095, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0107, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0096, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0096, device='cuda:0', grad_fn=<NllLossBackward>)
--------------EPOCH 3-------------
Test Accuracy: tensor(0.9397, device='cuda:0')
Loss: tensor(0.0167, device='cuda:0', grad_fn=<NllLossBackward>)
Saving model at iteration: 0
Loss: tensor(0.0142, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0121, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0154, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0160, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0287, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0123, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0251, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0081, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0253, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0125, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0188, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0120, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0170, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0150, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0124, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0213, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0161, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0172, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0204, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0218, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0202, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0257, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0154, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0234, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0153, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0165, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0254, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0234, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0175, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0113, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0169, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0178, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0133, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0176, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0275, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0190, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0241, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0136, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0178, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0234, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0165, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0251, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0160, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0262, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0159, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0125, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0267, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0101, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0189, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0246, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0142, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0244, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0104, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0102, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0189, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0153, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0108, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0099, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0275, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0096, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0214, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0153, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0180, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0088, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0206, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0148, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0157, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0113, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0213, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0140, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0104, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0174, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0123, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0091, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0167, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0107, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0289, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0168, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0140, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0198, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0181, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0213, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0153, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0094, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0200, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0118, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0197, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0108, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0133, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0202, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0149, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0170, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0128, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0107, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0086, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0188, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0084, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0118, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0131, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0124, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0175, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0136, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0139, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0123, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0114, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0092, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0112, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0024, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0086, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0024, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0122, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0130, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0162, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0023, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0157, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0144, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0191, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0114, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0117, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0118, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0085, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0213, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0100, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0162, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0107, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0114, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0139, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0172, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0019, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0019, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0141, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0082, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0141, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0212, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0267, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0018, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0087, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0018, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0018, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0017, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0111, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0209, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0093, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0096, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0213, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0125, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0106, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0172, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0117, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0015, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0143, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0096, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0108, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0109, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0015, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0257, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0015, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0099, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0014, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0164, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0170, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0199, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0262, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0094, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0128, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0081, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0150, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0257, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0241, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0215, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0104, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0129, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0218, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0142, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0121, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0102, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0106, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0013, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0291, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0161, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0157, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0162, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0013, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0186, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0233, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0294, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0120, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0146, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0151, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0134, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0257, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0154, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0196, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0115, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0185, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0223, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0137, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0220, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0145, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0224, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0154, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0155, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0164, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0172, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0227, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0119, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0175, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0122, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0115, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0147, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0104, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0196, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0148, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0105, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0206, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0140, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0168, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0177, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0254, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0113, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0252, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0145, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0115, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0102, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0149, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0103, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0182, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0233, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0140, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0136, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0129, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0225, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0092, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0167, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0206, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0205, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0157, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0120, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0152, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0249, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0142, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0201, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0080, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0102, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0189, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0146, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0090, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0134, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0164, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0165, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0126, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0241, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0185, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0186, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0097, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0145, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0125, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0123, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0197, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0161, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0187, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0178, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0138, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0134, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0124, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0258, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0191, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0165, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0237, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0122, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0218, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0291, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0155, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0167, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0095, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0212, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0206, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0205, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0156, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0088, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0147, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0135, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0145, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0083, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0125, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0194, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0113, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0156, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0108, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0253, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0164, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0175, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0173, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0201, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0213, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0158, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0094, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0115, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0139, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0142, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0137, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0164, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0088, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0100, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0156, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0119, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0214, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0102, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0131, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0127, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0239, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0167, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0151, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0096, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0221, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0097, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0148, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0135, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0142, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0085, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0147, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0149, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0115, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0212, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0226, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0097, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0190, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0091, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0135, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0120, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0238, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0099, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0198, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0105, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0196, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0160, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0262, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0132, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0235, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0186, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0149, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0132, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0148, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0168, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0245, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0133, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0180, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0162, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0200, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0131, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0112, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0148, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0085, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0148, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0099, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0162, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0266, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0201, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0111, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0082, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0114, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0199, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0263, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0151, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0169, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0249, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0254, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0130, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0127, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0100, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0080, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0152, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0099, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0124, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0135, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0141, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0180, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0168, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0104, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0108, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0102, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0080, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0081, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0152, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0144, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0129, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0124, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0135, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0220, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0116, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0159, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0142, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0122, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0221, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0162, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0153, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0128, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0114, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0127, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0090, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0112, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0121, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0103, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0183, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0093, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0170, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0115, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0131, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0083, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0110, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0181, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0185, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0139, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0182, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0191, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0154, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0160, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0127, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0182, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0123, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0120, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0137, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0099, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0152, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0149, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0193, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0118, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0101, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0202, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0126, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0121, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0254, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0327, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0104, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0176, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0115, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0168, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0096, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0182, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0021, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0151, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0117, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0106, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0155, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0086, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0176, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0125, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0158, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0084, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0216, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0175, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0087, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0224, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0097, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0101, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0155, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0179, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0162, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0114, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0197, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0102, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0097, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0193, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0232, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0162, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0170, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0225, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0132, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0170, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0169, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0140, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0136, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0205, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0117, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0094, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0092, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0017, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0093, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0129, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0181, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0171, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0203, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0149, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0129, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0184, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0165, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0173, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0094, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0017, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0147, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0162, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0151, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0128, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0113, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0143, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0142, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0137, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0225, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0159, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0163, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0124, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0160, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0138, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0111, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0115, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0108, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0177, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0213, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0157, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0189, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0127, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0191, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0160, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0101, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0159, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0136, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0110, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0190, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0212, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0228, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0087, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0186, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0161, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0108, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0163, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0171, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0348, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0107, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0103, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0022, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0097, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0103, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0093, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0096, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0165, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0101, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0108, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0120, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0153, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0120, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0120, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0097, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0085, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0141, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0165, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0097, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0119, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0151, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0158, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0084, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0119, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0155, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0173, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0145, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0094, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0099, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0183, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0106, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0182, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0185, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0194, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0147, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0092, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0156, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0147, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0202, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0087, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0125, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0205, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0157, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0155, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0199, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0094, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0087, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0135, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0105, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0112, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0262, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0251, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0197, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0126, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0107, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0089, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0138, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0147, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0154, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0145, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0114, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0257, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0197, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0148, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0156, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0145, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0106, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0200, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0156, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0142, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0189, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0162, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0125, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0252, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0092, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0159, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0148, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0169, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0106, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0101, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0110, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0258, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0143, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0091, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0152, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0116, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0098, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0130, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0114, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0123, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0227, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0118, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0283, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0085, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0113, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0200, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0109, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0156, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0215, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0105, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0108, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0103, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0104, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0094, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0174, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0120, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0191, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0099, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0160, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0119, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0089, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0102, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0240, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0230, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0102, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0283, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0159, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0090, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0100, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0100, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0148, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0108, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0168, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0140, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0146, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0104, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0213, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0150, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0164, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0184, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0139, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0182, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0092, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0333, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0113, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0130, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0082, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0098, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0087, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0138, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0216, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0095, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0116, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0150, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0086, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0204, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0096, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0166, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0149, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0176, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0024, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0146, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0103, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0208, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0105, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0196, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0162, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0016, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0119, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0156, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0142, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0144, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0232, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0131, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0122, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0111, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0136, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0182, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0096, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0140, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0090, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0179, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0106, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0194, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0186, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0106, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0121, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0155, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0174, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0147, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0122, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0147, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0141, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0089, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0138, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0185, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0231, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0091, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0142, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0091, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0174, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0161, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0096, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0095, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0168, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0117, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0099, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0087, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0125, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0122, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0196, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0120, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0193, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0166, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0167, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0207, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0167, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0086, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0178, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0108, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0265, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0208, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0134, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0165, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0183, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0105, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0123, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0084, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0148, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0101, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0169, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0171, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0126, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0257, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0121, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0186, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0135, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0128, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0126, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0104, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0014, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0252, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0189, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0173, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0181, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0264, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0215, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0210, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0210, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0144, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0226, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0085, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0218, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0164, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0146, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0107, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0082, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0169, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0105, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0115, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0240, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0120, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0196, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0140, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0133, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0133, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0208, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0187, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0198, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0160, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0160, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0177, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0192, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0202, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0120, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0122, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0228, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0142, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0138, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0181, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0115, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0259, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0087, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0086, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0185, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0160, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0093, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0122, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0106, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0105, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0132, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0141, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0157, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0180, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0085, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0184, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0126, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0139, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0173, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0173, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0105, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0162, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0123, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0093, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0180, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0103, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0138, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0235, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0165, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0200, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0131, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0129, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0112, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0152, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0106, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0120, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0103, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0156, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0157, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0103, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0098, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0090, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0250, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0146, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0132, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0120, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0212, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0108, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0130, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0241, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0123, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0103, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0122, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0083, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0115, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0119, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0261, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0176, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0123, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0112, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0096, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0230, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0179, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0149, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0238, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0174, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0136, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0170, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0173, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0119, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0136, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0128, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0096, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0124, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0104, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0127, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0112, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0202, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0122, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0158, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0180, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0095, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0102, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0205, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0189, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0156, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0088, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0214, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0121, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0100, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0202, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0195, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0111, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0187, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0110, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0272, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0238, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0190, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0120, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0141, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0131, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0173, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0189, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0192, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0154, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0132, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0239, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0250, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0017, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0175, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0122, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0086, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0138, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0090, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0116, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0261, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0090, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0177, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0023, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0120, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0104, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0103, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0146, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0175, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0148, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0080, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0133, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0088, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0159, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0119, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0160, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0105, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0216, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0155, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0232, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0096, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0139, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0109, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0082, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0084, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0082, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0215, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0088, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0138, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0107, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0123, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0209, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0104, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0086, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0137, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0093, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0159, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0184, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0164, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0096, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0134, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0205, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0161, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0146, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0147, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0184, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0157, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0128, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0145, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0178, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0235, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0173, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0121, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0165, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0107, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0216, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0192, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0178, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0136, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0250, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0185, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0114, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0110, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0092, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0114, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0157, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0151, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0019, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0238, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0186, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0108, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0093, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0439, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0123, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0095, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0179, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0083, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0142, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0174, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0153, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0119, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0008, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0151, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0147, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0118, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0102, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0109, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0095, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0197, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0183, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0201, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0148, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0117, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0173, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0101, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0151, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0015, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0088, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0081, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0193, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0110, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0163, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0093, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0147, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0020, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0086, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0094, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0126, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0151, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0139, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0105, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0122, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0251, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0132, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0090, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0111, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0174, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0083, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0150, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0185, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0153, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0149, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0127, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0196, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0128, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0125, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0168, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0251, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0213, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0188, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0238, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0150, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0082, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0228, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0222, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0161, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0222, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0118, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0086, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0150, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0225, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0122, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0140, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0188, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0131, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0220, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0159, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0088, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0130, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0203, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0300, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0138, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0163, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0158, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0190, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0162, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0214, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0105, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0242, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0155, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0179, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0091, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0086, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0249, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0187, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0188, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0118, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0166, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0083, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0210, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0084, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0114, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0140, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0252, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0116, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0159, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0084, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0021, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0159, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0083, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0127, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0201, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0108, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0255, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0132, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0024, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0147, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0097, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0138, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0265, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0108, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0124, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0107, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0144, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0173, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0122, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0213, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0205, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0159, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0168, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0021, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0093, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0104, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0163, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0116, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0116, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0116, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0162, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0159, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0120, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0124, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0120, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0102, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0149, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0095, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0163, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0112, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0087, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0095, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0197, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0089, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0139, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0296, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0130, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0127, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0141, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0240, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0120, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0182, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0128, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0084, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0108, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0186, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0136, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0174, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0227, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0090, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0211, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0096, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0190, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0203, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0109, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0153, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0096, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0103, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0129, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0132, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0198, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0130, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0136, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0190, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0175, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0233, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0110, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0127, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0177, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0101, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0125, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0081, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0178, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0155, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0099, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0178, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0218, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0144, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0224, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0131, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0119, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0200, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0008, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0127, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0181, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0118, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0315, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0104, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0180, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0194, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0160, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0098, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0147, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0159, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0085, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0098, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0131, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0120, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0134, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0109, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0215, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0263, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0177, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0097, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0184, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0081, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0117, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0085, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0084, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0149, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0129, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0097, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0110, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0118, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0097, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0193, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0149, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0211, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0204, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0141, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0109, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0152, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0125, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0116, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0188, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0084, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0008, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0172, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0197, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0144, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0194, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0146, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0091, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0091, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0096, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0133, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0134, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0173, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0105, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0113, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0140, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0152, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0151, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0119, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0104, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0143, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0120, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0099, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0209, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0219, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0131, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0154, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0087, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0150, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0160, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0167, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0012, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0198, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0149, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0090, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0238, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0153, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0159, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0183, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0130, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0085, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0129, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0201, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0093, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0253, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0133, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0292, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0131, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0121, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0198, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0148, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0116, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0162, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0121, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0148, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0114, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0118, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0180, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0020, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0167, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0125, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0016, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0123, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0160, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0152, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0112, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0115, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0127, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0187, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0138, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0109, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0144, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0168, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0131, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0164, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0110, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0158, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0086, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0110, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0165, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0111, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0121, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0018, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0179, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0087, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0204, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0194, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0099, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0163, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0204, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0153, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0093, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0088, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0091, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0164, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0190, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0104, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0194, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0138, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0162, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0163, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0137, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0120, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0109, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0112, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0096, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0219, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0115, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0167, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0107, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0110, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0175, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0100, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0090, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0286, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0162, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0080, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0277, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0255, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0118, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0129, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0158, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0179, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0158, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0132, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0116, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0163, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0159, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0161, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0130, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0103, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0136, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0145, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0131, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0018, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0117, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0080, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0007, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0151, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0241, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0148, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0117, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0234, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0091, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0136, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0203, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0194, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0085, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0104, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0122, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0085, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0080, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0092, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0179, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0126, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0110, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0138, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0176, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0199, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0150, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0009, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0105, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0190, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0006, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0164, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0094, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0082, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0094, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0213, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0095, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0118, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0163, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0129, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0113, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0135, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0251, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0181, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0080, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0083, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0120, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0094, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0151, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0220, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0012, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0196, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0132, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0158, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0122, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0170, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0168, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0213, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0166, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0100, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0153, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0203, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0103, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0150, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0187, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0106, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0150, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0112, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0104, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0189, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0182, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0100, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0146, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0137, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0147, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0166, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0134, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0254, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0169, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0145, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0088, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0249, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0127, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0132, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0099, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0136, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0171, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0153, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0193, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0171, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0097, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0155, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0162, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0121, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0119, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0085, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0084, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0159, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0150, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0353, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0114, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0242, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0095, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0159, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0195, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0114, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0158, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0088, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0178, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0094, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0123, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0158, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0125, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0170, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0104, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0080, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0183, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0183, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0125, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0180, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0117, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0177, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0138, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0202, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0185, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0162, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0188, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0156, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0127, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0138, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0130, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0152, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0106, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0241, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0153, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0107, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0023, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0093, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0103, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0114, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0194, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0153, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0082, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0098, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0195, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0254, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0172, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0309, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0143, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0133, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0020, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0237, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0099, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0131, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0101, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0118, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0113, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0180, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0151, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0184, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0151, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0162, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0143, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0201, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0085, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0170, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0265, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0165, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0154, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0119, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0130, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0091, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0084, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0173, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0205, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0095, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0097, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0205, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0303, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0194, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0104, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0229, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0308, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0204, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0139, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0173, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0245, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0106, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0126, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0220, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0133, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0132, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0201, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0144, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0120, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0095, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0117, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0172, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0164, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0095, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0112, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0151, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0081, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0139, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0115, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0121, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0113, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0246, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0086, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0095, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0175, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0125, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0225, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0087, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0088, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0256, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0144, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0149, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0157, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0159, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0133, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0116, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0140, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0144, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0147, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0086, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0210, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0158, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0201, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0160, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0103, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0189, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0140, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0196, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0088, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0217, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0097, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0144, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0185, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0207, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0080, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0211, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0152, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0185, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0083, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0114, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0110, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0087, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0129, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0251, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0104, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0189, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0127, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0146, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0179, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0164, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0117, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0098, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0177, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0178, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0137, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0146, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0155, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0147, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0134, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0096, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0187, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0127, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0128, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0114, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0121, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0116, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0020, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0127, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0154, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0119, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0126, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0118, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0107, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0126, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0130, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0093, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0131, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0156, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0097, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0135, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0166, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0154, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0163, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0158, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0116, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0118, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0224, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0154, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0185, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0193, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0104, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0105, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0135, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0154, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0084, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0190, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0144, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0113, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0112, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0159, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0210, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0163, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0163, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0151, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0192, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0148, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0157, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0199, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0104, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0105, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0181, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0099, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0154, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0216, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0130, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0237, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0138, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0109, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0181, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0159, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0154, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0086, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0116, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0116, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0182, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0134, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0139, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0157, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0095, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0143, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0131, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0137, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0167, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0346, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0150, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0150, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0134, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0280, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0099, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0166, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0181, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0185, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0095, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0178, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0080, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0103, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0095, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0085, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0154, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0170, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0127, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0207, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0221, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0126, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0113, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0082, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0152, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0094, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0187, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0155, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0092, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0096, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0086, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0087, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0121, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0107, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0135, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0248, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0144, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0143, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0084, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0213, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0142, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0158, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0101, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0178, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0103, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0122, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0218, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0199, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0014, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0131, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0127, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0214, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0334, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0113, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0131, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0140, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0187, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0149, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0162, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0017, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0206, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0110, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0120, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0108, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0090, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0023, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0214, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0108, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0164, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0222, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0093, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0087, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0117, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0163, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0114, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0148, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0123, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0152, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0250, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0290, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0179, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0166, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0111, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0087, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0105, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0126, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0102, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0298, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0184, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0227, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0125, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0220, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0145, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0164, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0097, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0313, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0132, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0133, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0134, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0219, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0100, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0110, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0151, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0142, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0164, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0126, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0170, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0212, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0093, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0104, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0165, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0261, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0151, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0182, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0143, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0114, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0188, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0109, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0112, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0226, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0146, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0191, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0092, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0086, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0088, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0167, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0199, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0093, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0149, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0154, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0232, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0183, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0114, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0130, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0125, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0150, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0128, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0123, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0130, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0086, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0124, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0117, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0131, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0086, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0130, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0097, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0126, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0187, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0207, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0168, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0089, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0220, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0119, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0143, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0174, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0127, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0094, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0018, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0141, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0089, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0139, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0137, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0099, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0131, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0137, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0101, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0205, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0139, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0126, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0102, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0100, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0109, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0127, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0135, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0117, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0152, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0184, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0143, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0178, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0177, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0145, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0182, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0193, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0167, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0114, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0081, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0185, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0087, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0092, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0016, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0250, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0020, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0214, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0164, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0121, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0209, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0015, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0142, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0161, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0164, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0168, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0201, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0118, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0287, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0170, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0087, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0111, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0106, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0119, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0145, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0092, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0100, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0184, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0164, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0190, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0128, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0126, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0104, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0234, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0256, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0159, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0146, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0108, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0156, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0165, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0023, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0197, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0212, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0170, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0183, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0182, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0095, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0200, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0114, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0121, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0223, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0145, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0083, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0121, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0167, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0112, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0152, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0121, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0152, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0152, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0096, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0181, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0136, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0168, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0105, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0153, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0193, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0236, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0110, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0118, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0083, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0191, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0188, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0120, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0104, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0097, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0177, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0089, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0152, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0094, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0208, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0201, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0115, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0109, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0228, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0179, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0193, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0107, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0131, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0100, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0126, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0132, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0126, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0096, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0273, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0091, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0149, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0109, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0086, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0148, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0091, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0115, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0095, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0155, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0255, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0165, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0176, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0142, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0113, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0185, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0084, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0115, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0090, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0127, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0121, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0022, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0182, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0276, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0081, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0245, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0165, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0160, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0158, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0178, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0158, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0149, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0136, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0210, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0185, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0110, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0215, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0186, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0116, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0114, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0201, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0184, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0219, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0086, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0120, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0130, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0139, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0100, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0111, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0088, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0130, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0134, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0111, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0106, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0100, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0148, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0126, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0138, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0163, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0114, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0247, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0092, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0153, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0122, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0081, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0184, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0112, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0211, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0224, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0187, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0136, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0182, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0103, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0188, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0109, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0162, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0170, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0159, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0083, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0158, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0363, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0114, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0101, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0105, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0082, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0172, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0104, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0091, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0132, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0106, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0129, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0094, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0092, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0084, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0105, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0137, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0133, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0108, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0080, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0184, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0111, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0087, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0203, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0104, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0167, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0128, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0020, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0111, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0102, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0111, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0105, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0119, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0113, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0164, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0085, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0187, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0240, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0155, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0113, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0250, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0201, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0111, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0218, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0090, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0129, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0224, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0080, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0104, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0119, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0155, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0097, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0134, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0136, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0119, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0207, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0009, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0135, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0125, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0112, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0128, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0162, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0129, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0135, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0090, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0125, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0164, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0081, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0138, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0120, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0098, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0112, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0125, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0132, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0083, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0161, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0145, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0091, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0135, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0018, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0198, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0014, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0160, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0196, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0175, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0086, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0178, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0138, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0256, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0154, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0145, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0116, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0124, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0085, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0179, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0113, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0165, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0135, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0152, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0120, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0110, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0104, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0013, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0100, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0112, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0016, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0080, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0197, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0320, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0085, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0200, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0209, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0215, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0129, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0216, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0185, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0136, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0157, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0132, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0121, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0100, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0108, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0181, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0151, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0089, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0163, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0171, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0152, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0081, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0174, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0101, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0281, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0020, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0132, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0110, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0175, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0163, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0174, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0182, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0158, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0106, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0202, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0124, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0140, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0110, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0150, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0087, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0093, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0148, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0169, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0100, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0122, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0140, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0080, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0119, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0281, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0080, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0150, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0202, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0141, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0081, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0114, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0137, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0176, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0209, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0181, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0099, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0164, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0147, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0082, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0112, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0095, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0159, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0137, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0109, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0134, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0227, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0111, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0138, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0105, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0103, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0096, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0098, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0142, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0152, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0225, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0170, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0130, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0098, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0178, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0223, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0162, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0142, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0180, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0133, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0157, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0176, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0175, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0134, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0099, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0023, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0087, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0125, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0187, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0161, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0098, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0128, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0147, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0219, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0123, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0081, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0152, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0252, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0089, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0109, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0277, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0083, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0128, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0129, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0183, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0115, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0016, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0155, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0106, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0246, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0147, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0106, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0124, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0098, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0318, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0090, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0143, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0131, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0097, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0129, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0210, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0090, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0116, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0109, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0157, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0108, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0203, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0238, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0127, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0084, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0105, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0200, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0082, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0153, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0135, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0143, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0200, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0169, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0237, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0119, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0090, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0114, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0149, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0129, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0142, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0099, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0178, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0097, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0130, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0146, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0242, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0231, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0138, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0236, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0138, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0140, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0150, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0016, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0110, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0106, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0141, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0122, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0276, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0083, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0143, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0175, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0127, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0103, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0176, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0147, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0134, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0113, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0013, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0161, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0212, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0239, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0116, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0138, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0166, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0131, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0145, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0110, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0084, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0252, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0100, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0250, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0170, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0144, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0254, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0153, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0136, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0119, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0132, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0150, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0159, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0116, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0201, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0096, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0137, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0119, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0178, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0191, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0104, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0104, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0124, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0128, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0131, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0089, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0195, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0082, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0120, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0149, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0141, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0286, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0112, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0139, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0315, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0132, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0100, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0265, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0106, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0117, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0129, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0099, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0147, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0195, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0217, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0085, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0091, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0126, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0180, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0170, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0121, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0103, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0106, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0129, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0134, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0153, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0087, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0174, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0123, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0082, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0177, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0081, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0122, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0184, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0128, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0020, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0101, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0023, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0187, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0151, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0118, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0084, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0126, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0115, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0111, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0150, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0093, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0086, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0233, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0198, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0086, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0211, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0092, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0238, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0125, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0086, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0157, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0219, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0081, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0188, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0082, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0129, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0183, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0127, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0102, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0235, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0091, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0089, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0147, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0147, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0176, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0097, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0109, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0103, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0191, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0170, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0227, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0174, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0183, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0176, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0109, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0138, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0186, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0098, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0121, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0099, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0106, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0206, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0090, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0151, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0093, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0128, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0116, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0183, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0167, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0112, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0096, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0199, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0143, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0136, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0149, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0170, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0133, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0119, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0189, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0132, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0156, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0164, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0103, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0149, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0187, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0103, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0238, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0127, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0085, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0131, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0218, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0127, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0202, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0010, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0197, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0093, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0094, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0082, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0154, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0130, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0101, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0149, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0142, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0103, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0139, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0093, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0143, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0089, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0105, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0093, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0117, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0146, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0121, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0114, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0113, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0105, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0179, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0189, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0256, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0195, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0231, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0154, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0140, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0098, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0124, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0195, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0162, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0207, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0244, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0092, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0091, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0088, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0129, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0171, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0165, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0111, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0179, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0296, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0228, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0101, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0242, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0131, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0144, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0131, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0110, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0180, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0134, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0136, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0155, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0138, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0148, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0179, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0153, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0109, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0114, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0178, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0250, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0098, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0022, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0130, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0085, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0122, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0089, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0135, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0152, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0165, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0161, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0279, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0165, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0156, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0112, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0151, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0117, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0189, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0120, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0160, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0170, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0094, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0142, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0208, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0143, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0126, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0117, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0240, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0114, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0081, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0169, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0239, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0082, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0212, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0189, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0188, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0124, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0246, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0107, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0083, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0220, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0082, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0181, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0359, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0189, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0220, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0086, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0198, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0104, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0141, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0164, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0181, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0323, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0156, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0141, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0220, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0110, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0115, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0126, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0189, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0127, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0093, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0184, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0197, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0098, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0208, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0118, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0268, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0010, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0178, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0114, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0090, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0090, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0098, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0184, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0148, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0094, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0087, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0106, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0205, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0127, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0286, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0148, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0172, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0140, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0153, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0176, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0087, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0088, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0084, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0220, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0118, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0186, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0088, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0118, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0196, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0172, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0107, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0142, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0118, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0219, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0144, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0165, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0100, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0170, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0130, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0232, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0167, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0168, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0277, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0259, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0182, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0115, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0130, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0254, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0172, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0187, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0125, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0134, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0116, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0184, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0158, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0224, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0178, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0122, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0092, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0170, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0092, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0105, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0108, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0136, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0183, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0104, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0126, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0147, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0124, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0083, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0143, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0147, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0198, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0164, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0200, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0114, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0100, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0118, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0168, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0138, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0120, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0115, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0101, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0100, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0159, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0085, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0187, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0080, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0138, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0209, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0163, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0130, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0108, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0137, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0134, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0118, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0127, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0193, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0131, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0184, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0122, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0147, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0174, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0099, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0109, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0141, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0195, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0113, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0187, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0186, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0094, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0188, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0123, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0104, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0165, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0135, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0106, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0123, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0110, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0108, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0133, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0193, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0183, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0110, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0169, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0125, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0159, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0020, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0085, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0139, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0100, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0239, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0108, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0139, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0157, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0162, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0196, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0134, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0273, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0124, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0153, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0131, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0023, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0127, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0115, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0203, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0147, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0083, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0017, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0091, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0151, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0137, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0242, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0118, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0145, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0100, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0154, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0144, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0252, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0094, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0082, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0085, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0115, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0176, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0092, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0250, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0114, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0091, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0206, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0257, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0087, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0156, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0131, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0180, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0131, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0086, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0158, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0193, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0090, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0193, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0117, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0204, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0015, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0115, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0109, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0144, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0163, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0155, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0103, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0110, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0119, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0280, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0098, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0122, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0105, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0080, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0177, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0096, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0115, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0107, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0082, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0203, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0217, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0112, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0135, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0089, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0182, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0155, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0081, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0102, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0185, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0086, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0149, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0199, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0098, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0141, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0112, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0219, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0097, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0127, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0163, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0130, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0259, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0117, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0179, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0139, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0165, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0125, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0137, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0115, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0169, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0166, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0144, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0081, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0130, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0124, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0136, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0167, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0161, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0239, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0207, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0094, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0126, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0082, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0182, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0084, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0223, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0150, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0146, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0154, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0129, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0099, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0132, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0104, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0163, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0193, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0106, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0167, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0176, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0146, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0175, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0179, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0184, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0017, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0111, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0177, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0137, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0227, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0180, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0234, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0130, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0174, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0178, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0181, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0124, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0014, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0146, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0200, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0210, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0097, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0101, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0130, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0267, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0195, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0099, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0106, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0128, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0018, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0288, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0090, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0112, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0148, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0296, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0186, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0083, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0107, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0141, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0156, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0227, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0155, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0177, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0297, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0114, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0098, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0085, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0101, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0148, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0112, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0111, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0171, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0171, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0114, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0104, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0124, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0130, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0128, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0165, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0134, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0155, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0136, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0124, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0182, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0113, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0159, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0153, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0084, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0122, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0125, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0111, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0087, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0086, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0195, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0112, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0152, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0143, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0115, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0268, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0145, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0090, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0100, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0197, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0116, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0082, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0175, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0166, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0101, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0109, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0124, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0018, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0102, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0148, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0122, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0153, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0099, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0092, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0098, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0168, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0116, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0088, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0133, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0112, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0173, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0194, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0109, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0114, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0107, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0122, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0132, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0087, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0109, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0090, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0195, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0147, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0188, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0141, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0180, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0083, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0094, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0007, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0135, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0139, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0115, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0081, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0133, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0108, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0167, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0107, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0130, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0167, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0148, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0195, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0136, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0142, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0194, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0172, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0110, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0154, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0110, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0113, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0122, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0096, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0122, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0226, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0080, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0116, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0164, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0125, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0090, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0120, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0126, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0083, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0251, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0154, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0112, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0123, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0114, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0209, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0122, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0105, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0205, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0179, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0196, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0088, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0131, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0166, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0143, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0147, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0099, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0174, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0188, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0172, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0099, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0195, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0209, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0135, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0167, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0094, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0153, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0116, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0159, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0127, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0104, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0123, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0116, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0090, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0129, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0144, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0129, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0106, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0124, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0103, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0208, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0116, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0147, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0148, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0118, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0021, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0092, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0264, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0180, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0172, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0144, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0087, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0137, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0153, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0293, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0098, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0129, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0145, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0155, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0112, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0127, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0200, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0092, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0199, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0131, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0184, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0147, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0161, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0148, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0132, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0119, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0115, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0214, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0126, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0149, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0113, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0184, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0132, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0157, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0173, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0170, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0121, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0153, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0117, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0086, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0083, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0125, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0083, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0091, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0165, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0174, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0142, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0086, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0175, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0084, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0214, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0141, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0201, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0305, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0244, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0091, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0158, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0104, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0128, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0279, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0125, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0137, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0008, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0155, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0120, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0175, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0104, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0007, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0115, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0089, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0123, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0160, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0193, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0082, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0161, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0126, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0183, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0120, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0167, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0093, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0232, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0149, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0128, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0265, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0099, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0175, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0148, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0096, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0176, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0101, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0089, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0208, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0101, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0146, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0167, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0132, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0188, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0131, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0133, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0245, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0271, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0211, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0137, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0239, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0114, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0095, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0115, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0095, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0085, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0083, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0120, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0112, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0235, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0171, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0152, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0162, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0124, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0116, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0108, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0150, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0136, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0224, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0082, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0099, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0121, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0146, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0173, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0123, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0144, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0156, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0140, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0138, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0010, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0128, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0084, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0114, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0149, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0157, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0186, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0126, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0174, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0108, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0199, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0106, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0258, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0095, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0159, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0088, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0126, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0210, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0164, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0208, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0100, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0108, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0157, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0207, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0145, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0230, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0103, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0243, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0101, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0159, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0110, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0101, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0142, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0118, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0161, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0099, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0146, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0188, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0104, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0131, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0183, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0253, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0111, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0122, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0197, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0090, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0136, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0124, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0094, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0149, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0188, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0086, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0156, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0083, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0141, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0134, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0093, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0112, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0149, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0100, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0196, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0123, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0119, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0163, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0109, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0125, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0283, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0106, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0130, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0086, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0151, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0219, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0178, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0095, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0149, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0165, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0158, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0091, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0114, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0229, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0177, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0155, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0117, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0175, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0157, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0148, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0126, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0162, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0213, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0010, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0157, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0171, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0138, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0095, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0118, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0109, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0103, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0180, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0177, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0102, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0176, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0257, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0087, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0149, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0118, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0087, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0159, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0136, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0245, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0147, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0087, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0180, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0099, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0143, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0237, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0188, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0134, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0149, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0129, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0148, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0286, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0143, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0090, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0096, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0143, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0100, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0099, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0206, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0022, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0094, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0101, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0224, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0143, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0112, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0101, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0084, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0281, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0187, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0108, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0140, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0203, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0115, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0091, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0088, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0172, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0167, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0101, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0202, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0085, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0164, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0222, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0169, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0130, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0018, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0137, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0275, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0118, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0148, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0107, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0220, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0237, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0170, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0218, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0171, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0270, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0122, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0145, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0113, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0109, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0018, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0183, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0174, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0196, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0118, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0138, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0149, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0168, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0141, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0091, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0111, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0097, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0112, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0105, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0099, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0124, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0024, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0145, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0208, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0188, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0206, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0133, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0146, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0217, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0165, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0094, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0175, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0186, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0136, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0106, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0169, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0092, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0084, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0150, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0095, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0175, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0160, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0095, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0127, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0260, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0083, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0094, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0168, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0091, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0148, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0119, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0186, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0177, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0145, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0083, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0102, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0137, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0121, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0205, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0081, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0114, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0182, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0145, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0159, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0082, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0138, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0128, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0115, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0005, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0150, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0191, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0147, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0181, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0116, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0107, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0106, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0172, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0282, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0092, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0091, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0142, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0192, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0024, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0125, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0226, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0080, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0093, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0110, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0160, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0082, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0137, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0144, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0081, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0125, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0134, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0006, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0219, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0170, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0106, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0111, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0160, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0130, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0189, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0204, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0100, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0128, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0135, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0089, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0269, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0138, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0226, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0105, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0208, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0085, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0169, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0131, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0157, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0169, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0130, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0094, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0303, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0165, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0114, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0099, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0117, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0180, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0121, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0101, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0217, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0157, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0190, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0223, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0109, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0145, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0139, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0149, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0097, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0211, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0120, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0162, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0197, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0081, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0084, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0176, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0165, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0177, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0126, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0116, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0164, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0116, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0186, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0270, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0159, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0148, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0137, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0180, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0156, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0113, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0126, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0083, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0244, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0133, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0090, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0115, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0165, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0252, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0222, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0159, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0128, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0215, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0102, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0105, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0138, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0229, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0158, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0147, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0112, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0202, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0111, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0115, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0011, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0083, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0082, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0093, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0193, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0151, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0138, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0096, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0156, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0004, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0220, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0084, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0169, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0113, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0133, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0160, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0313, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0192, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0141, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0209, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0208, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0143, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0174, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0091, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0146, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0108, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0099, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0305, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0092, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0107, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0182, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0180, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0157, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0185, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0093, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0085, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0004, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0022, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0168, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0140, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0220, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0090, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0087, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0202, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0093, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0202, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0228, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0129, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0150, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0147, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0080, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0098, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0111, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0142, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0105, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0181, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0081, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0173, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0216, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0170, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0018, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0225, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0294, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0096, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0106, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0103, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0152, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0087, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0022, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0144, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0157, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0110, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0159, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0127, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0125, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0150, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0175, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0114, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0191, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0148, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0142, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0004, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0100, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0100, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0115, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0098, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0139, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0105, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0157, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0184, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0152, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0004, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0098, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0114, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0140, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0157, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0162, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0108, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0091, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0095, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0140, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0115, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0103, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0161, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0129, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0004, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0134, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0175, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0140, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0138, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0091, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0138, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0157, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0088, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0105, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0160, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0097, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0080, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0146, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0109, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0165, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0095, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0270, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0150, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0128, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0155, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0134, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0110, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0158, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0164, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0151, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0114, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0144, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0080, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0150, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0246, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0087, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0117, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0117, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0149, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0086, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0174, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0245, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0081, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0104, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0144, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0167, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0090, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0108, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0124, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0135, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0196, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0123, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0104, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0131, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0130, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0099, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0168, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0163, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0136, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0099, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0103, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0022, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0107, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0130, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0129, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0115, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0250, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0133, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0138, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0178, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0192, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0169, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0126, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0159, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0090, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0103, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0205, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0135, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0113, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0219, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0238, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0253, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0178, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0082, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0215, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0146, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0136, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0091, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0117, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0187, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0139, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0093, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0223, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0095, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0130, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0217, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0182, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0141, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0154, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0100, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0165, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0157, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0106, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0216, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0149, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0155, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0102, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0098, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0222, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0196, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0103, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0114, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0230, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0199, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0147, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0010, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0205, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0121, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0090, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0132, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0169, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0190, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0169, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0124, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0145, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0097, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0080, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0082, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0129, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0194, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0131, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0113, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0128, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0150, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0169, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0126, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0102, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0110, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0221, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0209, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0172, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0137, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0130, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0082, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0118, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0223, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0163, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0155, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0082, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0175, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0081, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0088, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0169, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0091, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0193, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0100, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0140, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0148, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0195, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0097, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0161, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0200, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0124, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0083, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0103, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0144, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0022, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0141, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0114, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0087, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0135, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0189, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0170, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0123, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0134, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0180, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0156, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0219, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0097, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0216, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0185, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0130, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0126, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0107, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0124, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0085, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0084, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0112, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0088, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0190, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0095, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0257, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0179, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0255, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0095, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0153, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0193, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0161, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0181, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0233, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0110, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0092, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0106, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0115, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0173, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0184, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0141, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0124, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0099, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0167, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0144, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0142, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0224, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0099, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0118, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0153, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0169, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0226, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0144, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0011, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0193, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0178, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0175, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0140, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0139, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0143, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0217, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0208, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0172, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0137, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0127, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0144, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0294, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0142, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0173, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0122, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0162, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0114, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0092, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0111, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0126, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0172, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0163, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0110, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0142, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0161, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0192, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0196, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0141, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0189, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0124, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0092, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0112, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0160, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0101, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0114, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0082, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0112, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0088, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0130, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0133, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0099, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0153, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0194, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0144, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0116, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0108, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0191, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0104, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0114, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0133, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0162, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0145, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0163, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0091, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0168, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0105, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0133, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0154, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0151, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0151, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0139, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0178, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0141, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0158, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0218, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0153, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0157, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0126, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0092, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0131, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0194, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0121, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0116, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0190, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0146, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0209, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0153, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0118, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0145, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0145, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0119, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0178, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0207, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0229, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0131, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0108, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0162, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0112, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0118, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0098, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0142, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0201, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0094, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0204, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0114, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0110, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0153, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0190, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0163, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0153, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0111, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0090, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0160, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0136, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0103, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0085, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0091, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0145, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0088, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0165, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0109, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0083, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0101, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0094, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0167, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0096, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0146, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0127, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0121, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0153, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0098, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0173, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0200, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0135, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0099, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0092, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0102, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0182, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0202, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0083, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0121, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0101, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0092, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0197, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0333, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0122, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0171, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0173, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0106, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0102, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0139, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0156, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0108, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0166, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0087, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0092, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0140, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0176, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0098, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0099, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0146, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0237, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0139, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0141, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0126, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0156, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0241, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0191, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0104, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0081, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0185, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0087, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0120, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0081, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0109, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0083, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0080, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0185, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0239, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0204, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0177, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0084, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0181, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0232, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0119, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0138, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0215, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0164, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0128, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0150, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0167, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0110, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0201, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0127, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0091, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0095, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0214, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0150, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0229, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0117, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0154, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0187, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0128, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0118, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0141, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0190, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0139, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0186, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0152, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0213, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0120, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0110, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0151, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0102, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0097, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0130, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0158, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0176, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0212, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0133, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0090, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0219, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0104, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0157, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0131, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0152, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0091, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0110, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0129, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0118, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0090, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0150, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0088, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0128, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0150, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0156, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0092, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0125, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0137, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0129, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0162, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0135, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0104, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0096, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0109, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0098, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0160, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0082, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0105, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0135, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0117, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0123, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0131, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0135, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0204, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0116, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0136, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0101, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0104, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0123, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0167, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0183, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0131, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0100, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0091, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0090, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0102, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0150, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0159, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0118, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0113, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0123, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0107, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0133, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0139, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0136, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0108, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0109, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0098, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0142, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0156, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0179, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0173, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0118, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0126, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0130, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0113, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0194, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0119, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0118, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0177, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0093, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0156, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0145, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0119, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0163, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0108, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0128, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0081, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0082, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0115, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0132, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0153, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0184, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0154, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0131, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0144, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0185, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0092, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0130, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0175, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0129, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0087, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0134, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0092, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0183, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0085, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0099, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0166, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0133, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0118, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0109, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0136, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0122, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0138, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0134, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0101, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0120, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0141, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0103, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0084, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0183, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0141, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0190, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0109, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0156, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0105, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0132, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0195, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0138, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0092, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0165, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0168, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0102, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0107, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0101, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0132, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0193, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0119, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0135, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0126, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0147, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0017, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0080, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0081, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0101, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0125, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0093, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0150, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0093, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0091, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0111, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0105, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0207, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0114, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0132, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0154, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0111, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0139, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0120, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0185, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0088, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0143, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0089, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0085, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0229, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0110, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0114, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0217, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0129, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0160, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0139, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0115, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0124, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0170, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0101, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0102, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0140, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0088, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0089, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0101, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0090, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0177, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0130, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0103, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0144, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0193, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0163, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0137, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0143, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0149, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0113, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0103, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0148, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0118, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0190, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0112, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0100, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0145, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0122, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0140, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0174, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0111, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0206, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0094, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0121, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0147, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0132, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0148, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0080, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0111, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0188, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0139, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0092, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0094, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0083, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0289, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0116, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0105, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0103, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0102, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0214, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0106, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0136, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0085, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0091, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0114, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0106, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0126, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0114, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0091, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0122, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0094, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0081, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0186, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0103, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0146, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0110, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0132, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0156, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0142, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0151, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0248, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0089, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0216, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0125, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0152, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0101, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0131, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0015, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0117, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0139, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0114, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0092, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0126, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0176, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0206, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0164, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0227, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0086, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0127, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0197, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0201, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0092, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0120, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0210, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0119, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0112, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0124, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0091, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0123, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0106, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0115, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0095, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0166, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0105, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0091, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0109, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0142, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0144, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0157, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0080, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0128, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0131, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0228, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0162, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0118, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0188, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0157, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0132, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0222, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0100, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0232, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0111, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0123, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0178, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0238, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0110, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0191, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0153, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0124, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0137, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0123, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0146, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0133, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0147, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0133, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0145, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0305, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0137, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0169, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0083, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0099, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0213, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0123, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0140, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0215, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0086, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0094, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0090, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0173, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0126, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0097, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0251, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0096, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0141, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0131, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0189, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0086, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0172, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0135, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0091, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0126, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0088, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0107, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0188, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0154, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0128, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0093, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0191, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0114, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0100, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0126, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0204, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0087, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0165, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0112, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0123, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0125, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0194, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0143, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0085, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0158, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0101, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0129, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0128, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0145, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0082, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0159, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0089, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0150, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0082, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0101, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0086, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0136, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0134, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0179, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0090, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0110, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0157, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0097, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0134, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0095, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0133, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0083, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0085, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0147, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0083, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0111, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0103, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0277, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0131, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0107, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0185, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0155, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0201, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0129, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0174, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0142, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0156, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0146, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0141, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0096, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0082, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0188, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0136, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0156, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0098, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0119, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0090, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0211, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0118, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0019, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0082, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0080, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0142, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0188, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0168, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0169, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0187, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0123, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0157, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0085, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0108, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0205, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0086, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0125, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0166, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0118, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0129, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0101, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0115, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0097, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0086, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0124, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0147, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0130, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0158, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0189, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0084, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0205, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0099, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0083, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0105, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0184, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0147, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0088, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0106, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0093, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0150, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0090, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0126, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0154, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0085, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0124, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0094, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0163, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0153, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0147, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0160, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0217, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0087, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0208, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0104, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0133, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0093, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0080, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0155, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0091, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0093, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0153, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0098, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0092, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0209, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0161, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0081, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0195, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0100, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0151, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0128, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0125, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0124, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0132, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0093, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0192, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0091, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0194, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0084, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0194, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0187, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0121, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0144, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0093, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0183, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0130, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0169, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0127, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0091, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0116, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0147, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0219, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0115, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0147, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0150, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0114, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0113, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0172, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0101, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0090, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0176, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0088, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0084, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0083, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0123, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0095, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0122, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0182, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0092, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0120, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0096, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0179, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0144, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0131, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0172, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0084, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0136, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0111, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0155, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0122, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0102, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0163, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0133, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0133, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0149, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0115, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0220, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0122, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0081, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0123, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0134, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0091, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0135, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0140, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0135, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0169, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0114, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0138, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0141, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0152, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0153, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0136, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0213, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0143, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0204, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0158, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0158, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0097, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0171, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0174, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0173, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0103, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0158, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0095, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0153, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0188, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0113, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0152, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0098, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0092, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0084, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0090, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0083, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0089, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0104, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0082, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0154, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0106, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0084, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0198, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0129, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0113, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0169, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0130, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0087, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0220, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0096, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0228, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0083, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0083, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0208, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0083, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0166, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0156, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0184, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0141, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0085, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0092, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0115, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0085, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0153, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0169, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0135, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0109, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0240, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0112, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0093, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0129, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0136, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0139, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0091, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0103, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0249, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0113, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0133, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0135, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0085, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0087, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0110, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0122, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0195, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0082, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0190, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0086, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0081, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0131, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0139, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0184, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0096, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0187, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0132, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0182, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0082, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0236, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0110, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0091, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0165, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0120, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0097, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0234, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0126, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0117, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0104, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0117, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0095, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0112, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0133, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0134, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0085, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0090, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0096, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0164, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0129, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0124, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0207, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0118, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0110, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0132, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0176, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0104, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0111, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0098, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0122, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0105, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0168, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0163, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0114, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0094, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0099, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0129, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0111, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0151, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0128, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0141, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0103, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0102, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0140, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0085, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0125, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0114, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0082, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0157, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0089, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0127, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0091, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0183, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0204, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0101, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0173, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0090, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0095, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0081, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0142, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0138, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0160, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0173, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0102, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0140, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0119, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0115, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0115, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0170, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0173, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0155, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0115, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0080, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0132, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0111, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0115, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0120, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0086, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0140, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0135, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0206, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0112, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0101, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0154, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0120, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0134, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0115, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0177, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0106, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0137, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0210, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0134, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0101, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0241, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0253, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0081, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0094, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0095, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0124, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0142, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0158, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0104, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0086, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0141, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0175, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0113, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0103, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0138, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0233, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0186, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0168, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0219, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0137, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0143, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0144, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0082, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0264, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0124, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0167, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0085, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0083, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0194, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0123, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0136, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0103, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0104, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0120, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0154, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0095, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0103, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0216, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0161, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0106, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0157, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0130, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0085, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0086, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0107, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0083, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0183, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0091, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0122, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0121, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0169, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0088, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0128, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0101, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0193, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0211, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0161, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0138, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0140, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0092, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0014, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0155, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0104, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0114, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0132, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0167, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0144, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0101, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0110, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0130, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0177, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0102, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0091, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0120, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0185, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0211, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0095, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0173, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0150, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0139, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0202, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0127, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0153, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0163, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0087, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0111, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0127, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0099, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0096, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0123, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0193, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0108, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0118, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0167, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0091, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0098, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0091, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0176, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0110, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0104, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0187, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0121, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0104, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0131, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0109, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0211, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0177, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0093, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0140, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0132, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0097, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0106, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0103, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0115, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0169, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0087, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0136, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0092, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0160, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0117, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0180, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0107, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0089, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0118, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0200, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0128, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0088, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0122, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0119, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0118, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0192, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0100, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0170, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0102, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0111, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0107, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0089, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0103, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0157, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0175, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0118, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0255, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0265, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0117, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0114, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0127, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0095, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0172, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0174, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0102, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0174, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0099, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0165, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0169, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0128, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0142, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0216, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0176, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0155, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0190, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0118, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0128, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0117, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0139, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0103, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0145, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0179, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0108, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0113, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0210, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0151, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0144, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0148, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0098, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0112, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0092, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0104, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0189, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0173, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0193, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0108, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0214, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0123, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0080, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0113, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0170, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0095, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0211, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0091, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0113, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0132, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0124, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0080, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0116, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0164, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0104, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0143, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0017, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0090, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0098, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0120, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0145, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0197, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0097, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0171, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0093, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0095, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0098, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0132, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0092, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0144, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0107, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0086, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0131, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0182, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0095, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0200, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0169, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0102, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0173, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0087, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0146, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0091, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0158, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0138, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0116, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0107, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0130, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0119, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0210, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0104, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0118, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0110, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0157, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0099, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0100, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0120, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0091, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0087, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0173, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0127, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0142, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0140, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0110, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0099, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0152, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0119, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0125, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0127, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0107, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0094, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0121, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0088, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0165, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0162, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0121, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0129, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0113, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0128, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0090, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0117, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0147, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0142, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0146, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0116, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0109, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0149, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0214, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0099, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0154, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0082, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0133, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0136, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0089, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0199, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0178, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0084, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0099, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0109, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0137, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0181, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0194, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0136, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0094, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0140, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0100, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0082, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0084, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0233, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0130, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0138, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0225, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0089, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0138, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0266, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0097, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0189, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0090, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0115, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0163, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0196, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0108, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0117, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0111, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0127, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0082, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0149, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0147, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0146, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0142, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0158, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0104, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0104, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0114, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0118, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0169, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0092, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0234, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0126, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0106, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0163, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0117, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0134, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0095, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0107, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0095, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0137, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0113, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0111, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0083, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0101, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0159, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0138, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0169, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0156, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0166, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0141, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0105, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0153, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0102, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0187, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0102, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0156, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0116, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0098, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0143, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0201, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0174, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0097, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0138, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0160, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0197, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0150, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0140, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0156, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0153, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0110, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0156, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0130, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0093, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0214, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0099, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0096, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0091, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0086, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0159, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0105, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0115, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0115, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0103, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0220, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0150, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0084, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0098, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0090, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0138, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0137, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0092, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0127, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0341, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0104, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0107, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0103, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0139, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0091, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0113, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0139, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0165, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0175, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0094, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0125, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0082, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0138, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0126, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0110, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0116, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0145, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0128, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0140, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0140, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0095, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0117, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0141, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0128, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0175, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0171, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0178, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0163, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0104, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0156, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0124, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0085, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0164, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0102, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0086, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0193, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0198, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0089, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0193, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0091, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0197, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0191, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0152, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0092, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0088, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0108, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0151, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0081, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0106, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0127, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0116, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0154, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0203, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0176, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0120, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0122, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0145, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0198, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0146, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0123, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0146, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0157, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0109, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0240, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0105, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0098, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0132, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0131, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0082, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0103, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0182, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0163, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0126, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0092, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0122, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0107, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0187, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0091, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0110, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0085, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0102, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0084, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0108, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0093, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0086, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0144, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0126, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0113, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0130, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0110, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0144, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0114, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0110, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0230, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0109, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0099, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0100, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0133, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0114, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0112, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0124, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0085, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0205, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0132, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0087, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0274, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0197, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0142, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0086, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0149, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0168, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0101, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0110, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0113, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0084, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0087, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0135, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0097, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0144, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0081, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0082, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0105, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0178, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0282, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0122, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0101, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0170, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0125, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0182, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0117, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0142, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0137, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0165, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0173, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0124, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0081, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0084, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0117, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0080, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0127, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0115, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0120, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0088, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0102, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0180, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0113, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0138, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0106, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0172, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0088, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0239, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0154, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0112, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0084, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0143, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0167, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0107, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0163, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0096, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0171, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0219, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0129, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0149, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0172, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0086, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0131, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0203, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0132, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0127, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0112, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0092, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0138, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0089, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0119, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0080, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0114, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0022, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0136, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0181, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0131, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0118, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0129, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0132, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0098, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0113, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0111, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0213, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0084, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0150, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0153, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0187, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0152, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0125, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0197, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0183, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0144, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0081, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0209, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0160, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0090, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0141, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0216, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0203, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0119, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0109, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0107, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0180, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0107, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0098, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0103, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0088, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0135, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0142, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0122, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0161, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0124, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0148, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0083, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0175, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0120, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0158, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0112, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0154, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0158, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0189, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0109, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0128, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0188, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0101, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0138, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0083, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0170, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0133, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0182, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0097, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0149, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0117, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0089, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0162, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0140, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0090, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0131, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0089, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0020, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0218, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0120, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0101, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0124, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0113, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0125, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0132, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0146, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0157, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0221, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0169, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0105, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0143, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0163, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0186, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0158, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0183, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0122, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0101, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0127, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0130, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0092, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0107, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0139, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0136, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0083, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0154, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0123, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0127, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0159, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0171, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0156, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0086, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0229, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0085, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0082, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0186, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0110, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0131, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0158, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0130, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0095, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0085, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0084, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0159, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0111, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0149, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0081, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0176, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0099, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0113, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0097, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0175, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0092, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0096, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0122, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0147, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0149, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0100, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0233, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0172, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0184, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0103, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0133, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0138, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0120, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0127, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0102, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0199, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0082, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0092, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0154, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0156, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0104, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0135, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0164, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0189, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0121, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0086, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0147, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0113, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0188, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0111, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0103, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0082, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0142, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0163, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0226, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0131, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0081, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0113, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0144, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0114, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0094, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0249, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0085, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0147, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0119, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0137, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0102, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0113, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0109, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0184, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0090, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0156, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0223, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0103, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0121, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0100, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0140, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0091, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0115, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0101, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0128, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0091, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0132, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0084, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0112, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0083, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0098, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0140, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0118, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0111, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0131, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0121, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0129, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0113, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0090, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0084, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0093, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0093, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0135, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0134, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0173, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0125, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0223, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0161, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0118, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0086, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0087, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0229, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0115, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0178, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0082, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0188, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0129, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0156, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0138, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0157, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0153, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0165, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0153, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0081, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0102, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0092, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0080, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0111, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0120, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0156, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0120, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0121, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0150, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0123, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0106, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0090, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0214, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0103, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0156, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0136, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0082, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0102, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0134, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0133, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0088, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0099, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0110, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0136, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0246, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0222, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0130, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0095, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0101, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0096, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0134, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0165, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0100, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0133, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0119, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0171, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0116, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0115, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0143, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0099, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0137, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0249, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0123, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0083, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0140, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0106, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0091, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0104, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0135, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0110, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0177, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0222, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0133, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0230, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0090, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0156, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0138, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0195, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0106, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0115, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0147, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0109, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0128, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0126, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0110, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0130, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0144, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0123, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0210, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0159, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0133, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0121, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0103, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0100, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0173, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0163, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0172, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0195, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0163, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0154, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0122, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0112, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0161, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0080, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0082, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0093, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0121, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0125, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0094, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0135, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0121, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0164, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0189, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0112, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0156, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0110, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0129, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0146, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0141, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0087, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0081, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0186, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0095, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0200, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0127, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0084, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0133, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0130, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0228, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0102, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0108, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0135, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0098, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0128, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0136, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0130, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0100, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0108, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0098, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0098, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0267, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0202, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0107, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0095, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0139, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0081, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0109, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0267, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0179, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0151, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0163, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0089, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0118, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0124, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0094, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0112, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0104, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0100, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0024, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0195, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0100, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0175, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0097, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0084, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0119, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0100, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0085, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0127, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0137, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0203, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0169, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0174, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0104, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0116, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0188, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0158, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0163, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0093, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0154, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0092, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0161, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0164, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0114, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0186, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0151, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0155, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0164, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0102, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0086, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0127, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0206, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0090, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0091, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0126, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0168, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0150, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0136, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0187, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0088, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0125, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0144, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0161, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0178, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0102, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0113, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0293, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0113, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0166, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0146, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0214, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0089, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0125, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0145, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0106, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0142, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0103, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0092, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0105, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0084, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0092, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0133, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0119, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0080, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0102, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0108, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0113, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0206, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0129, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0131, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0154, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0163, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0148, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0112, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0081, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0097, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0118, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0097, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0131, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0213, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0092, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0112, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0165, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0102, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0083, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0094, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0111, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0111, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0122, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0195, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0112, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0215, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0214, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0122, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0087, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0167, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0174, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0112, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0120, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0156, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0103, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0099, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0122, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0167, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0194, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0113, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0133, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0132, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0184, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0100, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0171, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0118, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0159, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0164, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0108, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0221, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0130, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0123, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0113, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0133, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0124, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0089, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0176, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0117, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0101, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0147, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0120, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0133, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0092, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0105, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0083, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0100, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0110, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0156, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0160, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0113, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0160, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0162, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0115, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0157, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0135, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0141, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0085, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0137, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0182, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0149, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0081, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0159, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0164, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0089, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0191, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0146, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0152, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0244, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0178, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0151, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0112, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0137, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0090, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0112, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0175, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0108, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0112, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0236, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0139, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0128, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0092, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0151, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0090, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0085, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0179, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0205, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0160, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0093, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0188, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0148, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0144, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0126, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0170, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0132, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0101, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0172, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0120, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0208, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0162, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0178, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0163, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0081, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0139, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0182, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0144, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0131, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0178, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0105, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0197, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0093, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0118, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0117, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0152, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0135, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0141, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0173, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0203, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0122, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0156, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0090, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0117, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0098, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0192, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0103, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0100, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0159, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0138, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0189, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0132, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0130, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0098, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0088, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0133, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0148, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0183, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0132, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0100, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0130, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0156, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0224, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0166, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0086, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0127, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0126, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0147, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0167, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0236, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0110, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0191, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0122, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0101, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0155, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0224, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0193, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0145, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0097, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0189, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0103, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0112, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0130, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0114, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0220, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0104, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0118, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0213, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0114, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0084, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0131, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0121, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0155, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0177, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0180, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0164, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0197, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0138, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0110, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0146, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0109, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0156, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0200, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0100, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0127, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0110, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0151, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0111, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0097, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0231, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0175, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0099, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0084, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0213, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0093, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0250, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0098, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0096, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0144, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0135, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0180, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0182, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0184, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0141, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0089, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0131, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0123, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0115, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0134, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0112, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0103, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0089, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0148, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0132, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0092, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0165, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0084, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0093, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0117, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0118, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0133, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0105, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0151, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0102, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0154, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0149, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0101, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0102, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0188, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0120, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0181, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0192, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0192, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0153, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0174, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0112, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0143, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0169, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0162, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0146, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0109, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0184, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0184, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0109, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0158, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0123, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0102, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0191, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0120, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0147, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0082, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0141, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0171, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0083, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0143, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0080, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0198, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0088, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0085, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0134, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0180, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0089, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0145, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0131, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0198, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0149, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0144, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0201, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0121, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0144, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0192, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0182, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0146, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0142, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0087, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0128, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0089, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0169, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0146, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0169, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0090, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0103, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0098, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0105, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0084, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0103, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0165, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0112, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0130, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0111, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0104, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0117, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0089, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0103, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0224, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0091, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0126, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0239, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0119, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0147, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0136, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0127, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0148, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0100, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0119, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0159, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0148, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0145, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0084, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0140, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0090, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0092, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0130, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0132, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0093, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0154, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0108, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0106, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0083, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0224, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0127, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0140, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0127, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0096, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0117, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0094, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0177, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0081, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0109, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0155, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0111, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0147, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0097, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0088, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0137, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0161, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0089, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0136, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0121, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0140, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0135, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0108, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0114, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0101, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0171, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0084, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0105, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0169, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0191, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0100, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0179, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0165, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0307, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0102, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0098, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0192, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0143, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0102, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0133, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0139, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0082, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0094, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0129, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0094, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0175, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0106, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0167, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0114, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0109, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0140, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0166, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0140, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0106, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0089, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0120, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0134, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0108, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0119, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0082, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0099, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0101, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0104, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0125, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0174, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0130, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0108, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0182, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0197, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0144, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0111, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0084, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0192, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0177, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0140, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0174, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0148, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0115, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0148, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0110, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0168, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0137, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0142, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0142, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0122, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0215, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0113, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0157, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0116, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0109, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0134, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0095, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0106, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0232, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0160, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0174, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0167, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0097, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0171, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0116, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0177, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0089, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0082, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0100, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0105, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0100, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0086, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0085, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0198, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0085, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0144, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0152, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0100, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0193, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0210, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0156, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0100, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0207, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0154, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0131, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0163, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0127, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0107, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0162, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0087, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0107, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0171, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0182, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0191, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0136, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0106, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0090, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0182, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0141, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0120, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0151, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0146, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0091, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0118, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0212, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0107, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0165, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0085, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0231, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0143, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0114, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0163, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0169, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0402, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0083, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0188, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0138, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0121, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0167, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0110, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0137, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0119, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0156, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0087, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0152, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0150, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0173, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0112, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0138, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0091, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0101, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0134, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0223, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0221, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0082, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0091, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0123, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0111, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0094, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0122, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0239, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0185, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0130, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0104, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0130, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0094, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0294, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0110, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0126, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0093, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0154, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0227, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0094, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0131, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0088, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0147, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0143, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0127, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0132, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0113, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0123, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0085, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0084, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0103, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0116, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0144, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0114, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0109, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0139, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0087, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0159, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0118, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0149, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0129, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0148, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0180, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0138, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0133, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0195, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0235, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0195, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0119, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0107, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0162, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0114, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0085, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0086, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0103, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0119, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0116, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0164, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0138, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0104, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0118, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0173, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0154, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0142, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0098, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0198, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0142, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0119, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0120, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0139, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0112, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0182, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0121, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0208, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0103, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0129, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0179, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0087, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0177, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0080, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0119, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0159, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0088, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0105, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0107, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0094, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0115, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0119, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0097, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0105, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0170, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0126, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0119, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0084, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0108, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0092, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0136, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0085, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0136, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0159, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0181, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0099, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0183, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0159, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0081, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0202, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0133, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0140, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0210, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0140, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0121, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0102, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0112, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0123, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0128, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0168, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0093, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0142, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0121, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0192, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0162, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0138, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0084, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0096, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0097, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0145, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0080, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0114, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0149, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0130, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0082, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0198, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0080, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0181, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0133, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0158, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0146, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0096, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0172, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0126, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0122, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0123, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0127, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0086, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0085, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0157, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0210, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0133, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0105, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0137, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0122, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0168, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0081, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0093, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0130, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0168, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0116, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0122, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0127, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0193, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0133, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0085, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0136, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0114, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0122, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0081, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0115, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0210, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0122, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0133, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0129, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0141, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0116, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0098, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0084, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0149, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0175, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0107, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0190, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0099, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0087, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0126, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0144, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0090, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0151, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0125, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0152, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0163, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0089, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0084, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0143, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0127, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0100, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0014, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0110, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0155, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0109, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0123, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0186, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0157, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0198, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0090, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0137, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0082, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0092, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0129, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0266, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0187, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0134, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0139, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0113, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0097, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0118, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0146, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0107, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0124, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0137, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0132, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0136, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0185, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0101, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0163, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0124, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0239, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0117, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0202, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0115, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0091, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0148, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0117, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0207, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0138, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0128, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0147, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0214, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0103, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0108, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0113, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0180, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0182, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0109, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0181, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0151, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0151, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0080, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0115, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0139, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0115, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0137, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0240, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0084, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0112, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0166, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0112, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0143, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0097, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0149, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0096, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0106, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0163, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0115, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0087, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0111, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0130, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0172, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0101, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0146, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0126, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0292, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0095, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0155, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0136, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0120, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0098, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0154, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0266, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0091, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0100, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0096, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0130, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0158, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0121, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0097, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0116, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0082, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0083, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0131, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0158, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0210, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0145, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0088, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0175, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0133, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0102, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0099, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0116, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0168, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0110, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0087, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0099, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0133, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0135, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0137, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0080, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0167, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0173, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0272, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0100, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0098, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0149, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0141, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0108, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0125, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0111, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0202, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0122, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0112, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0111, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0169, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0092, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0111, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0100, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0159, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0108, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0082, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0143, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0148, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0111, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0103, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0154, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0126, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0142, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0157, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0101, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0089, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0175, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0179, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0122, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0109, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0103, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0101, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0166, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0146, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0130, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0133, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0118, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0098, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0155, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0160, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0215, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0080, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0165, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0090, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0123, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0146, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0123, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0198, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0115, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0086, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0133, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0118, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0086, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0171, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0089, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0139, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0141, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0206, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0095, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0127, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0173, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0209, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0148, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0124, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0116, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0097, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0133, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0115, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0199, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0104, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0116, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0212, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0187, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0120, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0149, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0142, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0109, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0149, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0094, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0109, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0082, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0106, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0094, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0118, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0172, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0181, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0115, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0095, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0109, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0084, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0107, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0111, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0136, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0167, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0151, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0097, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0125, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0082, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0089, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0217, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0108, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0108, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0141, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0083, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0091, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0126, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0086, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0136, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0097, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0114, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0120, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0121, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0128, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0147, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0190, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0146, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0200, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0121, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0247, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0112, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0114, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0150, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0175, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0132, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0090, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0119, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0109, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0110, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0092, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0093, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0181, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0145, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0184, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0198, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0142, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0165, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0117, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0127, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0097, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0101, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0109, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0161, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0133, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0095, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0134, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0096, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0138, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0081, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0125, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0166, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0202, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0253, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0148, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0094, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0104, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0129, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0103, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0161, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0104, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0095, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0104, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0103, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0131, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0155, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0110, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0186, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0119, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0132, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0103, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0115, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0115, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0131, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0117, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0093, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0176, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0147, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0096, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0153, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0148, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0147, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0084, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0129, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0102, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0117, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0093, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0116, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0085, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0153, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0175, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0086, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0122, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0104, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0159, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0125, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0089, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0101, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0121, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0214, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0133, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0100, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0093, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0141, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0134, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0119, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0126, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0142, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0115, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0095, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0178, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0104, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0172, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0176, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0175, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0113, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0190, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0178, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0122, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0163, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0126, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0161, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0156, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0143, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0113, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0127, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0191, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0090, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0084, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0140, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0101, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0118, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0105, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0098, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0123, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0088, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0172, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0111, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0092, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0109, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0109, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0119, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0116, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0150, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0106, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0106, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0082, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0115, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0123, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0136, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0201, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0024, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0173, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0108, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0173, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0173, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0090, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0126, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0115, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0107, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0122, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0089, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0163, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0136, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0106, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0083, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0123, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0117, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0133, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0098, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0137, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0196, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0121, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0119, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0153, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0142, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0102, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0134, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0182, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0121, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0117, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0170, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0151, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0089, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0210, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0181, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0240, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0136, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0118, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0098, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0141, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0216, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0161, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0092, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0124, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0106, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0114, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0151, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0165, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0163, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0088, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0128, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0160, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0083, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0094, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0087, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0154, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0095, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0179, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0099, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0167, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0093, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0080, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0146, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0137, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0188, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0094, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0112, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0087, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0160, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0107, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0140, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0141, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0098, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0181, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0142, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0122, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0196, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0099, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0158, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0102, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0145, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0164, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0144, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0156, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0154, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0122, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0162, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0112, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0088, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0122, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0161, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0157, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0177, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0147, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0143, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0119, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0147, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0127, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0106, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0127, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0130, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0107, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0090, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0203, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0103, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0177, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0242, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0091, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0123, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0221, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0088, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0127, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0108, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0107, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0106, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0106, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0087, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0106, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0107, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0094, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0119, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0159, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0114, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0141, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0160, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0131, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0130, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0147, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0082, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0113, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0172, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0119, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0166, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0100, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0130, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0149, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0134, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0145, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0120, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0100, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0155, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0090, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0097, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0121, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0122, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0109, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0080, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0118, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0117, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0130, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0099, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0080, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0157, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0120, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0156, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0169, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0214, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0154, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0112, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0120, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0090, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0110, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0146, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0226, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0112, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0103, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0151, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0093, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0112, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0096, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0110, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0103, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0193, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0131, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0104, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0099, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0100, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0121, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0103, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0183, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0096, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0111, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0144, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0096, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0173, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0092, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0170, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0158, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0182, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0210, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0121, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0082, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0154, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0159, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0139, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0134, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0108, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0103, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0083, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0117, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0125, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0147, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0122, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0092, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0148, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0112, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0241, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0223, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0133, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0148, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0115, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0104, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0095, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0103, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0142, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0203, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0160, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0130, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0147, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0110, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0157, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0118, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0128, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0120, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0112, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0081, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0177, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0163, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0099, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0098, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0085, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0097, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0111, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0080, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0092, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0093, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0226, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0124, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0123, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0112, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0105, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0159, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0202, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0096, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0146, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0093, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0140, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0082, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0113, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0119, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0084, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0152, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0193, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0109, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0182, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0118, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0246, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0085, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0141, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0095, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0098, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0146, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0110, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0086, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0119, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0157, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0116, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0126, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0155, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0102, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0127, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0211, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0105, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0155, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0209, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0132, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0091, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0156, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0152, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0133, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0095, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0099, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0121, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0134, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0172, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0125, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0131, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0150, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0169, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0167, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0106, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0108, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0190, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0115, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0112, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0131, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0220, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0119, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0082, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0093, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0105, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0192, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0136, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0100, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0091, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0131, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0093, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0106, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0138, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0199, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0094, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0236, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0126, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0121, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0165, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0185, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0094, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0099, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0082, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0130, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0080, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0081, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0126, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0135, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0149, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0108, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0147, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0141, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0140, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0122, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0024, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0127, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0088, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0126, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0127, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0141, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0080, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0144, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0105, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0096, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0118, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0126, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0101, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0148, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0126, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0122, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0124, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0166, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0116, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0125, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0124, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0165, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0122, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0135, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0138, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0148, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0202, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0126, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0093, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0100, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0170, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0105, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0149, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0116, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0140, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0152, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0098, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0142, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0176, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0109, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0102, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0190, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0120, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0151, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0125, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0161, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0160, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0087, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0225, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0121, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0103, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0159, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0082, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0145, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0088, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0092, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0144, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0088, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0135, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0099, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0143, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0132, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0143, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0090, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0137, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0161, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0096, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0133, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0105, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0092, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0099, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0115, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0146, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0087, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0104, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0111, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0090, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0195, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0104, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0084, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0203, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0103, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0148, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0178, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0133, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0112, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0081, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0133, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0144, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0172, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0081, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0082, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0150, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0122, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0093, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0150, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0136, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0105, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0114, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0082, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0167, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0099, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0106, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0082, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0102, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0080, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0080, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0145, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0101, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0095, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0194, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0156, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0097, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0080, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0088, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0115, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0134, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0164, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0144, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0117, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0108, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0087, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0201, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0085, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0102, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0135, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0084, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0136, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0111, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0101, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0185, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0118, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0137, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0090, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0086, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0102, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0122, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0092, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0189, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0115, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0107, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0131, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0089, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0129, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0089, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0147, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0108, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0117, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0086, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0092, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0146, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0120, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0140, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0098, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0084, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0146, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0150, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0187, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0088, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0107, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0081, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0157, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0164, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0100, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0097, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0123, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0107, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0093, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0149, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0144, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0116, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0094, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0088, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0143, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0258, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0149, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0142, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0105, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0109, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0101, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0106, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0116, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0087, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0169, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0091, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0114, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0127, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0195, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0128, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0153, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0103, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0186, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0085, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0127, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0092, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0084, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0121, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0082, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0084, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0107, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0125, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0137, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0164, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0104, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0182, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0179, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0116, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0142, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0113, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0198, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0118, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0157, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0132, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0144, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0088, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0131, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0142, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0165, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0093, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0117, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0141, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0115, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0166, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0121, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0133, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0085, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0081, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0123, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0145, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0184, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0143, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0149, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0080, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0116, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0103, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0149, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0084, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0154, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0108, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0149, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0117, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0112, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0149, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0093, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0111, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0131, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0125, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0083, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0115, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0140, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0126, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0130, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0088, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0131, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0122, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0094, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0134, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0137, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0113, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0087, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0120, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0087, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0159, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0133, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0181, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0109, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0087, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0107, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0128, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0130, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0090, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0151, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0156, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0091, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0126, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0107, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0164, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0108, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0145, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0114, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0089, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0147, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0100, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0156, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0134, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0134, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0195, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0169, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0309, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0020, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0106, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0112, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0088, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0122, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0148, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0144, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0080, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0119, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0083, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0097, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0106, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0107, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0108, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0120, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0140, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0159, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0166, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0129, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0091, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0131, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0089, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0196, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0248, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0214, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0090, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0091, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0165, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0112, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0121, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0173, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0102, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0119, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0121, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0090, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0097, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0091, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0120, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0119, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0124, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0200, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0099, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0125, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0124, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0092, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0273, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0154, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0152, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0097, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0098, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0130, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0153, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0116, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0163, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0114, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0082, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0170, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0134, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0182, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0111, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0102, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0128, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0224, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0170, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0123, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0120, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0147, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0112, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0134, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0158, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0149, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0142, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0183, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0126, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0123, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0106, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0099, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0106, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0151, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0197, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0100, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0208, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0178, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0188, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0105, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0137, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0259, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0116, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0129, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0081, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0094, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0156, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0120, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0093, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0086, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0127, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0080, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0148, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0169, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0117, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0152, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0134, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0120, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0106, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0159, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0097, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0110, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0094, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0082, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0119, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0141, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0142, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0174, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0157, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0015, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0124, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0154, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0118, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0113, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0110, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0121, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0111, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0156, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0144, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0142, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0141, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0097, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0210, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0132, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0095, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0099, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0217, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0142, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0144, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0177, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0184, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0103, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0083, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0107, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0132, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0092, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0126, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0107, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0150, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0174, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0176, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0172, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0115, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0090, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0102, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0153, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0095, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0083, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0238, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0116, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0119, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0126, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0178, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0090, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0130, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0081, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0139, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0119, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0219, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0088, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0140, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0128, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0088, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0114, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0124, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0108, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0218, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0091, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0208, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0140, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0116, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0177, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0100, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0101, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0110, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0143, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0183, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0158, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0119, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0095, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0116, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0099, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0100, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0163, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0085, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0109, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0171, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0120, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0142, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0019, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0081, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0086, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0149, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0135, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0113, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0111, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0089, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0186, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0117, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0092, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0129, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0091, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0085, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0095, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0183, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0226, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0160, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0162, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0103, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0130, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0128, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0137, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0261, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0143, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0101, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0145, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0099, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0093, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0181, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0323, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0124, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0089, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0080, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0111, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0140, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0099, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0087, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0089, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0147, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0124, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0081, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0194, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0137, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0081, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0082, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0135, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0105, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0135, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0103, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0099, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0084, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0115, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0142, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0119, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0096, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0096, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0145, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0097, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0161, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0098, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0102, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0159, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0136, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0160, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0131, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0133, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0129, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0096, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0084, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0080, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0112, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0128, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0099, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0136, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0126, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0093, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0120, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0127, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0134, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0173, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0118, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0093, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0121, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0115, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0121, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0082, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0175, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0130, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0113, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0126, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0124, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0095, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0104, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0140, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0166, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0119, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0084, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0102, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0102, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0125, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0121, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0087, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0100, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0159, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0137, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0093, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0122, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0104, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0174, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0116, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0118, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0102, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0081, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0189, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0082, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0123, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0116, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0198, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0151, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0111, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0098, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0131, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0092, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0082, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0098, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0129, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0100, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0292, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0126, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0212, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0214, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0096, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0136, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0295, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0116, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0187, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0184, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0124, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0102, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0116, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0153, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0088, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0103, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0090, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0084, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0147, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0123, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0101, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0192, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0163, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0098, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0083, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0104, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0174, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0114, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0101, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0155, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0110, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0120, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0119, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0136, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0099, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0113, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0163, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0092, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0127, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0106, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0130, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0121, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0189, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0093, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0108, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0142, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0091, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0100, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0120, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0082, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0127, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0154, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0085, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0120, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0138, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0179, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0130, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0100, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0122, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0177, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0097, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0144, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0096, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0116, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0130, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0143, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0128, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0093, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0141, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0093, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0191, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0112, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0143, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0157, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0093, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0154, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0094, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0095, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0146, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0101, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0103, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0082, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0082, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0138, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0099, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0300, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0088, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0094, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0104, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0207, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0110, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0165, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0100, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0218, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0153, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0092, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0108, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0111, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0137, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0137, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0177, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0115, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0237, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0145, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0110, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0101, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0125, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0085, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0152, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0116, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0102, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0133, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0151, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0130, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0166, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0113, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0123, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0134, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0104, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0127, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0096, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0177, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0125, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0192, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0138, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0107, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0112, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0108, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0091, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0135, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0087, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0119, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0209, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0165, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0156, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0081, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0156, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0107, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0180, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0177, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0089, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0211, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0183, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0119, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0114, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0124, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0101, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0111, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0089, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0132, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0146, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0103, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0177, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0187, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0131, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0085, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0126, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0169, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0099, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0217, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0194, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0137, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0167, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0122, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0151, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0210, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0123, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0128, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0220, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0122, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0125, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0140, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0092, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0119, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0165, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0100, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0166, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0212, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0148, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0204, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0110, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0088, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0116, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0163, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0138, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0142, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0087, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0202, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0126, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0134, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0114, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0095, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0119, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0082, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0083, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0084, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0183, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0095, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0105, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0080, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0294, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0170, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0160, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0248, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0080, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0250, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0090, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0101, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0110, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0097, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0102, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0097, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0083, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0146, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0088, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0123, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0249, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0127, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0149, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0098, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0087, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0171, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0012, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0105, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0098, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0144, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0084, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0092, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0081, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0133, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0083, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0114, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0095, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0141, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0096, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0140, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0080, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0152, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0095, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0203, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0138, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0115, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0107, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0191, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0092, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0094, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0104, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0146, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0101, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0146, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0182, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0080, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0186, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0117, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0125, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0107, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0213, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0112, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0163, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0103, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0106, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0017, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0127, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0112, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0165, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0133, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0148, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0180, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0117, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0081, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0111, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0123, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0187, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0125, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0116, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0120, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0107, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0104, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0083, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0102, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0166, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0112, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0113, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0151, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0222, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0109, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0139, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0120, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0093, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0155, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0121, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0082, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0138, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0081, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0174, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0086, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0102, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0188, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0169, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0098, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0183, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0083, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0120, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0125, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0136, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0083, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0159, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0024, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0123, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0103, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0094, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0099, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0127, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0115, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0115, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0083, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0130, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0138, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0106, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0107, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0132, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0095, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0162, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0144, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0143, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0122, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0091, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0123, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0190, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0107, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0161, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0117, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0096, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0121, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0123, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0014, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0162, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0147, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0091, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0146, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0091, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0112, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0154, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0092, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0128, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0179, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0113, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0100, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0166, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0157, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0132, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0098, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0142, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0128, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0115, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0124, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0137, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0136, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0124, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0202, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0101, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0086, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0125, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0186, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0282, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0134, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0111, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0102, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0111, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0094, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0150, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0101, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0096, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0162, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0120, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0116, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0098, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0094, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0119, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0114, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0168, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0106, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0182, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0096, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0125, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0132, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0304, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0082, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0095, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0154, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0084, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0109, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0082, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0112, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0110, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0118, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0162, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0091, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0139, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0094, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0166, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0097, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0103, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0087, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0107, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0100, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0146, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0084, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0109, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0127, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0088, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0136, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0080, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0138, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0103, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0085, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0088, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0213, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0099, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0092, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0084, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0093, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0125, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0092, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0095, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0100, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0137, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0152, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0094, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0196, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0216, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0104, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0091, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0150, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0190, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0104, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0145, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0145, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0146, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0117, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0207, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0183, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0088, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0147, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0127, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0110, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0186, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0203, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0098, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0171, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0114, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0116, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0140, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0085, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0204, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0191, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0157, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0183, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0140, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0090, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0115, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0133, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0091, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0126, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0110, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0123, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0241, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0132, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0088, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0139, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0099, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0154, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0165, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0160, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0188, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0190, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0105, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0136, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0193, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0105, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0083, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0089, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0250, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0238, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0093, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0095, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0083, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0124, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0101, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0145, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0082, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0098, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0140, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0097, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0134, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0087, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0099, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0122, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0103, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0087, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0101, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0204, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0093, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0100, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0190, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0150, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0115, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0138, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0164, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0112, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0085, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0147, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0122, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0190, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0099, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0144, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0146, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0128, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0174, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0108, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0104, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0169, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0166, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0112, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0099, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0107, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0093, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0130, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0084, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0109, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0131, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0086, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0114, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0177, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0145, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0128, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0182, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0082, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0208, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0175, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0100, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0096, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0151, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0104, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0132, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0149, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0105, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0112, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0130, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0179, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0184, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0169, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0139, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0108, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0119, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0112, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0145, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0081, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0177, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0108, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0099, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0114, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0092, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0084, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0123, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0164, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0120, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0093, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0107, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0099, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0098, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0096, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0105, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0113, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0099, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0259, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0099, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0120, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0090, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0115, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0112, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0091, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0103, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0139, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0144, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0131, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0096, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0141, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0097, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0153, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0080, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0092, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0158, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0097, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0184, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0080, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0146, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0098, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0085, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0177, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0111, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0133, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0144, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0102, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0094, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0105, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0111, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0157, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0138, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0082, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0132, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0130, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0177, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0168, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0108, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0084, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0109, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0099, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0108, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0017, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0161, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0159, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0086, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0085, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0113, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0097, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0121, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0130, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0169, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0107, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0171, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0191, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0108, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0128, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0133, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0088, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0109, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0130, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0117, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0169, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0108, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0175, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0094, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0148, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0082, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0162, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0099, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0142, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0120, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0103, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0092, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0096, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0163, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0228, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0158, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0217, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0112, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0135, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0153, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0148, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0120, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0091, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0263, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0104, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0100, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0102, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0136, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0092, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0084, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0143, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0143, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0102, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0103, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0238, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0096, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0158, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0085, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0102, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0143, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0190, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0202, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0108, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0120, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0097, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0106, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0132, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0232, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0114, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0125, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0102, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0101, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0115, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0099, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0198, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0232, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0129, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0157, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0104, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0087, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0092, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0148, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0148, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0110, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0111, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0080, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0130, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0121, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0127, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0127, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0086, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0124, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0103, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0116, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0121, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0091, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0155, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0185, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0160, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0082, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0109, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0111, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0113, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0114, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0102, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0149, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0128, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0085, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0243, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0166, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0175, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0109, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0170, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0107, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0095, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0177, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0086, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0164, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0146, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0137, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0111, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0161, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0162, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0147, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0104, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0161, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0082, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0094, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0132, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0144, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0080, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0165, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0106, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0196, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0089, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0205, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0115, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0150, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0183, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0093, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0102, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0160, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0110, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0091, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0111, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0156, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0135, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0097, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0159, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0104, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0183, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0107, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0159, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0158, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0141, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0096, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0126, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0205, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0095, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0126, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0103, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0134, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0142, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0151, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0149, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0087, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0133, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0085, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0104, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0099, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0134, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0109, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0161, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0115, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0104, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0111, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0121, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0123, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0141, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0086, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0098, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0138, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0082, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0156, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0102, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0117, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0116, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0105, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0113, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0134, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0186, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0105, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0100, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0089, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0088, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0163, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0121, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0101, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0148, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0098, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0112, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0120, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0147, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0106, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0093, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0106, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0110, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0098, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0140, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0222, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0089, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0133, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0144, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0194, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0120, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0159, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0107, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0085, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0175, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0169, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0090, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0080, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0082, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0131, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0176, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0152, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0119, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0133, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0093, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0106, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0109, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0159, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0094, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0112, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0108, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0120, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0107, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0141, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0194, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0112, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0102, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0161, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0195, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0117, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0214, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0120, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0114, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0094, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0111, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0109, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0147, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0126, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0180, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0105, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0090, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0114, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0128, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0113, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0082, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0176, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0120, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0108, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0082, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0082, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0083, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0171, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0131, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0134, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0110, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0107, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0124, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0138, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0123, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0169, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0158, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0120, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0127, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0106, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0171, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0111, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0159, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0176, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0103, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0090, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0144, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0206, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0170, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0183, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0155, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0154, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0092, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0092, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0081, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0111, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0160, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0183, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0117, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0146, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0085, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0123, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0126, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0150, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0135, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0135, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0165, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0084, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0102, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0168, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0165, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0168, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0107, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0091, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0087, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0081, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0104, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0148, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0120, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0083, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0105, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0145, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0092, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0083, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0205, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0169, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0114, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0100, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0098, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0116, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0089, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0097, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0158, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0086, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0087, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0133, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0226, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0112, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0089, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0123, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0092, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0112, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0110, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0160, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0142, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0137, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0125, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0080, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0168, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0128, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0148, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0110, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0083, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0091, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0107, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0131, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0151, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0091, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0099, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0129, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0114, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0183, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0237, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0146, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0145, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0097, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0104, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0084, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0164, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0119, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0092, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0165, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0160, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0131, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0095, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0136, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0223, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0084, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0093, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0101, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0151, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0107, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0081, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0145, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0080, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0123, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0109, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0120, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0097, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0086, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0103, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0100, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0133, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0108, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0104, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0089, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0111, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0090, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0091, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0151, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0109, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0091, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0094, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0149, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0086, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0157, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0167, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0126, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0128, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0196, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0152, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0161, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0099, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0102, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0118, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0125, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0091, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0088, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0120, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0098, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0171, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0082, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0125, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0098, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0175, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0160, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0137, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0148, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0234, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0182, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0102, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0084, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0114, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0122, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0176, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0119, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0083, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0103, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0135, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0129, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0173, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0139, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0081, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0120, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0164, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0129, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0086, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0086, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0137, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0100, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0134, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0135, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0096, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0100, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0103, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0093, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0126, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0111, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0099, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0136, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0109, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0108, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0261, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0130, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0200, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0097, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0128, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0083, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0116, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0092, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0104, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0126, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0108, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0087, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0081, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0101, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0189, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0105, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0093, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0093, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0080, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0126, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0093, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0089, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0204, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0096, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0092, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0133, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0094, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0160, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0119, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0173, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0140, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0166, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0130, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0126, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0115, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0088, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0114, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0089, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0150, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0183, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0115, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0100, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0156, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0120, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0090, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0082, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0094, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0104, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0121, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0129, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0159, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0085, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0130, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0108, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0089, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0100, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0086, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0114, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0112, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0099, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0120, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0084, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0138, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0101, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0120, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0090, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0198, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0130, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0250, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0115, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0122, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0114, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0100, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0182, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0111, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0172, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0166, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0100, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0157, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0146, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0096, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0103, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0152, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0080, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0080, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0097, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0106, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0087, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0090, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0127, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0190, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0170, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0129, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0154, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0125, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0099, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0163, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0146, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0097, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0205, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0278, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0127, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0104, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0156, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0104, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0099, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0081, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0083, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0081, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0109, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0088, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0119, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0112, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0128, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0100, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0126, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0180, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0161, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0094, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0136, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0080, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0110, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0101, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0084, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0156, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0117, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0169, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0226, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0091, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0116, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0162, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0189, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0090, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0095, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0164, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0110, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0089, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0126, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0090, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0177, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0119, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0120, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0172, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0126, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0149, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0165, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0161, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0218, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0097, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0098, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0137, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0150, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0122, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0105, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0090, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0124, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0101, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0121, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0199, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0108, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0087, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0099, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0092, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0091, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0125, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0122, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0098, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0139, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0092, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0117, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0151, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0113, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0107, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0155, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0084, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0158, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0083, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0216, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0118, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0091, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0118, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0144, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0151, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0109, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0138, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0114, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0083, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0122, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0122, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0141, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0086, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0097, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0126, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0166, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0088, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0121, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0021, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0080, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0091, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0135, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0099, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0114, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0163, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0176, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0090, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0151, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0157, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0092, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0112, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0101, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0114, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0109, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0080, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0166, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0170, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0090, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0166, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0127, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0190, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0115, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0085, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0087, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0119, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0133, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0129, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0157, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0089, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0141, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0084, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0121, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0120, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0134, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0263, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0189, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0145, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0100, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0130, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0123, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0210, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0082, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0141, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0141, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0146, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0136, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0122, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0162, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0107, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0086, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0168, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0202, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0146, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0116, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0104, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0115, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0111, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0213, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0138, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0156, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0106, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0115, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0163, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0145, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0103, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0093, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0114, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0148, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0099, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0168, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0088, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0133, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0179, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0145, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0145, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0144, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0141, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0130, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0095, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0106, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0121, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0119, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0129, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0105, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0163, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0144, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0098, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0116, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0091, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0100, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0118, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0084, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0164, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0133, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0150, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0151, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0110, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0116, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0085, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0090, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0100, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0160, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0115, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0170, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0154, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0105, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0122, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0098, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0107, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0115, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0157, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0098, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0119, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0115, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0143, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0093, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0088, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0135, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0082, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0147, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0150, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0095, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0196, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0129, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0193, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0116, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0100, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0088, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0112, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0081, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0123, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0134, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0119, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0108, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0120, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0087, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0101, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0082, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0116, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0129, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0101, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0100, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0094, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0221, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0095, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0138, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0148, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0092, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0116, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0099, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0126, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0143, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0116, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0175, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0122, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0087, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0156, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0115, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0097, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0121, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0211, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0084, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0095, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0117, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0102, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0096, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0123, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0124, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0121, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0123, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0134, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0081, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0100, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0093, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0167, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0102, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0081, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0083, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0116, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0109, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0124, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0139, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0102, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0120, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0194, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0188, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0113, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0129, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0126, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0140, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0102, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0103, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0133, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0130, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0173, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0150, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0147, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0119, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0209, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0128, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0094, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0087, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0172, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0117, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0160, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0143, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0142, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0159, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0089, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0126, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0117, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0107, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0092, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0098, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0083, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0114, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0173, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0154, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0103, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0160, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0112, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0096, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0161, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0156, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0110, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0146, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0190, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0117, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0121, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0101, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0112, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0092, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0146, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0153, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0093, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0150, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0133, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0141, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0082, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0116, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0112, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0100, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0129, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0120, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0099, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0114, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0114, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0111, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0099, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0118, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0155, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0085, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0179, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0113, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0136, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0140, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0110, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0110, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0126, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0092, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0169, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0127, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0151, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0167, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0106, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0105, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0126, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0176, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0130, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0101, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0093, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0180, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0081, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0094, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0091, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0093, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0124, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0094, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0175, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0113, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0084, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0155, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0148, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0101, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0088, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0099, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0095, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0169, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0118, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0188, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0210, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0092, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0107, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0117, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0119, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0109, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0105, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0143, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0143, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0106, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0131, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0100, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0159, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0155, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0084, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0091, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0082, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0107, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0094, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0093, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0131, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0188, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0127, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0095, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0093, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0092, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0160, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0095, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0102, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0103, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0114, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0148, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0097, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0144, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0126, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0126, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0123, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0090, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0112, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0137, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0224, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0156, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0080, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0185, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0111, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0159, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0119, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0101, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0104, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0201, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0136, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0123, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0125, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0127, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0096, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0107, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0135, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0090, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0102, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0131, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0141, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0096, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0172, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0081, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0116, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0128, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0128, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0020, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0103, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0106, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0125, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0106, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0165, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0084, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0145, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0150, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0090, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0112, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0104, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0092, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0103, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0094, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0080, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0137, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0095, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0126, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0179, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0089, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0169, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0100, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0084, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0112, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0121, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0150, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0087, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0104, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0111, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0138, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0083, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0180, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0094, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0135, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0088, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0104, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0115, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0129, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0208, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0123, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0096, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0116, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0219, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0130, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0105, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0175, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0110, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0124, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0156, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0163, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0085, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0133, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0104, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0122, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0098, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0094, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0149, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0095, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0116, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0103, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0092, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0087, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0132, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0118, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0105, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0138, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0093, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0100, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0092, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0121, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0140, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0125, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0150, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0096, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0105, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0086, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0125, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0111, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0097, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0085, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0140, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0098, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0102, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0173, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0092, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0192, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0084, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0104, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0122, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0101, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0097, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0085, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0089, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0082, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0146, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0080, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0093, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0196, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0246, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0089, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0080, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0105, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0131, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0016, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0125, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0096, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0083, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0112, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0120, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0109, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0128, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0082, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0149, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0131, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0098, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0143, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0111, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0108, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0132, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0102, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0131, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0118, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0129, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0172, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0129, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0097, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0221, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0149, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0098, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0125, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0084, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0096, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0215, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0094, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0109, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0180, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0148, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0132, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0093, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0126, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0138, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0097, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0128, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0147, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0129, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0088, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0116, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0087, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0124, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0137, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0151, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0150, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0092, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0088, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0262, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0125, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0090, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0160, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0121, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0124, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0083, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0089, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0085, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0114, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0101, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0157, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0238, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0174, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0157, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0130, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0113, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0150, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0160, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0096, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0088, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0126, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0219, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0091, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0110, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0127, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0103, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0112, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0121, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0101, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0106, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0123, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0129, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0102, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0220, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0180, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0149, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0177, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0130, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0152, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0188, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0136, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0134, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0086, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0184, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0131, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0082, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0134, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0122, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0083, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0159, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0092, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0122, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0107, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0092, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0114, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0080, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0095, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0109, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0117, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0201, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0093, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0089, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0120, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0095, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0091, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0098, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0113, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0091, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0083, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0194, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0090, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0191, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0112, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0164, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0085, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0092, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0097, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0104, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0110, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0128, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0114, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0129, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0096, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0096, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0162, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0122, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0105, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0086, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0092, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0114, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0128, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0098, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0115, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0147, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0105, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0131, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0138, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0127, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0088, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0088, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0209, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0170, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0085, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0091, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0123, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0222, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0155, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0190, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0159, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0157, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0138, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0088, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0097, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0112, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0097, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0144, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0144, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0155, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0144, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0089, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0101, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0099, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0089, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0096, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0144, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0147, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0120, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0083, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0089, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0159, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0115, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0124, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0086, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0085, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0160, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0209, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0144, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0083, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0125, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0152, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0119, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0131, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0140, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0199, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0110, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0177, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0111, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0150, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0083, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0105, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0109, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0084, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0143, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0126, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0084, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0132, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0151, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0097, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0108, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0094, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0104, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0091, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0210, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0103, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0177, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0088, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0192, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0081, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0116, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0080, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0107, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0128, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0140, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0119, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0155, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0111, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0155, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0090, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0113, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0105, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0107, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0092, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0120, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0128, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0099, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0097, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0103, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0149, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0120, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0112, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0106, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0102, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0139, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0080, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0147, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0080, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0134, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0125, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0136, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0111, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0100, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0159, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0114, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0130, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0099, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0137, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0106, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0246, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0118, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0144, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0089, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0105, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0152, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0098, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0094, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0158, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0133, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0088, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0115, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0128, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0100, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0115, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0159, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0081, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0103, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0118, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0083, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0088, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0131, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0121, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0155, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0083, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0117, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0113, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0087, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0093, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0084, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0096, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0172, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0111, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0110, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0126, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0089, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0126, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0099, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0130, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0114, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0091, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0132, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0110, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0202, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0099, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0167, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0166, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0114, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0136, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0113, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0150, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0155, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0182, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0129, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0085, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0224, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0086, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0100, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0092, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0103, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0141, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0147, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0111, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0090, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0107, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0136, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0106, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0127, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0082, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0100, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0135, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0126, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0222, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0119, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0197, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0085, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0207, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0177, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0083, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0118, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0165, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0163, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0108, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0139, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0139, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0128, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0143, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0106, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0091, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0146, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0115, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0113, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0083, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0223, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0117, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0114, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0091, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0218, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0196, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0142, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0080, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0112, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0143, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0116, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0089, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0201, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0081, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0150, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0158, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0125, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0099, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0174, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0154, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0095, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0141, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0120, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0127, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0126, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0117, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0082, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0086, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0130, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0082, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0082, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0103, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0120, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0140, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0137, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0100, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0116, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0158, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0252, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0106, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0131, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0092, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0103, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0144, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0162, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0104, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0137, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0119, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0098, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0128, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0132, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0153, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0101, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0084, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0080, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0107, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0146, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0163, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0101, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0101, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0096, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0130, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0112, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0085, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0179, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0119, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0136, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0109, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0115, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0106, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0114, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0150, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0160, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0111, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0085, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0095, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0117, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0088, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0102, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0117, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0251, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0103, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0160, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0092, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0097, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0207, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0139, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0093, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0081, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0139, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0088, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0109, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0255, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0090, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0084, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0141, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0174, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0134, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0113, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0119, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0127, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0080, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0210, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0159, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0088, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0175, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0168, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0106, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0097, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0125, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0170, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0138, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0169, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0115, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0087, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0093, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0084, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0146, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0128, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0095, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0163, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0132, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0083, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0082, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0115, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0166, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0124, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0112, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0145, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0126, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0122, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0116, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0183, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0084, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0137, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0134, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0117, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0114, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0141, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0129, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0099, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0096, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0088, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0084, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0127, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0185, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0098, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0161, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0107, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0151, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0107, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0128, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0093, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0115, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0133, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0095, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0110, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0091, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0161, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0135, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0097, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0169, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0104, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0141, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0103, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0115, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0083, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0111, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0150, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0086, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0135, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0136, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0094, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0086, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0086, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0122, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0108, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0117, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0144, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0097, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0154, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0106, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0152, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0199, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0105, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0097, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0135, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0123, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0102, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0241, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0094, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0159, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0184, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0136, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0150, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0100, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0176, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0124, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0130, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0183, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0086, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0170, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0107, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0085, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0097, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0156, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0110, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0115, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0102, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0096, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0087, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0168, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0121, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0105, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0154, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0160, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0080, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0110, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0130, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0112, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0087, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0113, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0101, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0087, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0086, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0110, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0087, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0123, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0181, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0149, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0143, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0191, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0197, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0084, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0110, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0080, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0092, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0100, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0104, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0138, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0169, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0111, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0100, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0107, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0093, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0099, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0099, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0133, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0112, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0094, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0097, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0127, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0114, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0146, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0085, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0190, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0101, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0107, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0116, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0185, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0122, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0209, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0141, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0108, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0089, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0148, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0089, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0083, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0123, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0090, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0086, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0120, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0210, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0163, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0164, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0127, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0101, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0108, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0108, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0083, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0137, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0092, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0100, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0123, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0203, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0198, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0118, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0105, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0091, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0083, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0108, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0110, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0106, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0140, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0109, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0107, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0142, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0135, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0085, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0092, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0136, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0107, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0081, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0154, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0116, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0093, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0179, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0136, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0092, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0102, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0090, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0093, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0099, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0085, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0140, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0104, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0162, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0110, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0084, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0157, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0150, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0094, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0088, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0101, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0103, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0122, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0082, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0107, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0092, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0120, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0118, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0204, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0089, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0139, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0166, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0102, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0098, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0112, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0111, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0150, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0125, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0090, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0120, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0154, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0133, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0138, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0106, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0086, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0098, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0155, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0179, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0097, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0123, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0104, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0103, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0115, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0115, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0089, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0115, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0219, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0093, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0142, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0088, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0185, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0102, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0127, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0084, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0141, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0133, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0139, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0094, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0148, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0157, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0121, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0127, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0105, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0169, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0099, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0110, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0086, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0142, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0126, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0087, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0098, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0081, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0098, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0122, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0125, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0147, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0129, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0091, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0081, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0152, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0142, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0098, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0088, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0106, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0126, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0087, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0148, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0088, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0124, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0138, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0081, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0144, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0155, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0109, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0134, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0211, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0088, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0081, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0083, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0132, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0169, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0091, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0118, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0135, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0245, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0089, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0131, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0114, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0082, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0123, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0100, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0148, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0119, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0104, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0100, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0102, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0090, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0088, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0124, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0100, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0182, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0143, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0103, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0083, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0085, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0162, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0163, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0101, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0171, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0125, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0097, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0083, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0108, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0128, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0116, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0094, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0108, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0121, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0092, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0102, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0142, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0098, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0119, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0182, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0120, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0109, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0088, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0131, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0132, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0117, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0092, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0105, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0130, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0159, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0188, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0141, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0133, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0108, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0104, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0127, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0127, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0097, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0145, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0106, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0106, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0107, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0082, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0095, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0115, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0121, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0092, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0099, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0158, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0114, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0103, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0148, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0085, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0092, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0094, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0098, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0101, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0088, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0102, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0148, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0102, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0083, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0163, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0100, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0098, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0094, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0109, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0108, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0122, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0113, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0183, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0094, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0133, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0086, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0091, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0099, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0087, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0168, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0093, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0099, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0096, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0180, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0123, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0110, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0101, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0150, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0108, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0095, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0109, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0108, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0089, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0136, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0089, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0110, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0110, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0116, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0083, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0128, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0160, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0139, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0106, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0126, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0153, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0110, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0085, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0146, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0164, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0112, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0102, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0088, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0109, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0091, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0088, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0172, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0121, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0133, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0124, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0209, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0164, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0210, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0138, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0118, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0088, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0091, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0083, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0101, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0137, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0080, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0132, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0088, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0109, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0120, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0119, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0184, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0137, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0103, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0088, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0137, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0108, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0130, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0188, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0093, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0208, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0120, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0107, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0161, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0155, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0124, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0126, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0153, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0086, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0159, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0104, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0165, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0113, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0120, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0136, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0113, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0089, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0154, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0097, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0134, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0125, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0086, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0098, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0094, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0127, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0120, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0157, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0131, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0118, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0157, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0110, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0019, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0095, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0080, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0089, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0092, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0100, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0102, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0106, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0122, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0097, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0096, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0115, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0087, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0108, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0091, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0081, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0099, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0242, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0225, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0249, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0117, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0094, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0103, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0092, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0086, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0141, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0121, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0089, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0096, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0140, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0159, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0129, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0120, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0134, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0172, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0252, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0128, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0090, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0117, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0151, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0094, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0179, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0093, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0123, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0080, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0108, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0091, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0097, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0090, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0126, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0183, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0095, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0130, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0144, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0132, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0103, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0131, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0106, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0130, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0131, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0088, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0136, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0160, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0084, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0154, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0171, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0114, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0093, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0085, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0131, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0095, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0081, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0114, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0095, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0159, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0114, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0160, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0102, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0092, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0151, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0086, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0102, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0133, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0123, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0086, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0096, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0102, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0114, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0158, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0094, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0119, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0135, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0114, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0117, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0105, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0109, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0115, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0089, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0197, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0111, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0132, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0125, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0080, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0168, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0099, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0137, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0138, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0136, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0139, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0108, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0166, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0194, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0114, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0099, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0109, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0106, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0154, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0125, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0108, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0081, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0134, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0154, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0163, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0119, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0102, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0097, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0093, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0134, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0085, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0092, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0115, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0119, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0212, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0212, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0137, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0085, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0130, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0143, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0116, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0110, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0100, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0130, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0094, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0103, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0089, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0095, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0088, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0136, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0112, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0146, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0096, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0132, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0135, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0140, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0111, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0108, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0278, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0088, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0144, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0127, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0168, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0134, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0128, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0104, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0106, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0117, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0143, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0138, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0126, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0092, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0082, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0114, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0127, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0086, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0098, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0257, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0164, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0114, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0125, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0083, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0149, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0143, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0100, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0084, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0116, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0092, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0138, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0183, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0118, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0136, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0109, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0117, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0133, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0157, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0099, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0085, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0094, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0098, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0103, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0148, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0175, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0136, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0099, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0130, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0145, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0087, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0092, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0127, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0130, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0096, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0115, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0091, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0139, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0143, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0093, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0080, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0116, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0128, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0120, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0084, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0118, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0124, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0107, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0109, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0121, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0122, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0097, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0124, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0129, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0092, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0160, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0087, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0140, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0082, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0199, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0103, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0112, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0159, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0088, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0110, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0154, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0089, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0130, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0105, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0082, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0130, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0166, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0148, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0135, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0118, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0119, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0108, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0089, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0098, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0180, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0164, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0087, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0184, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0085, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0103, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0093, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0086, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0126, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0123, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0108, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0081, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0134, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0111, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0161, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0191, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0099, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0102, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0095, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0128, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0096, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0090, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0096, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0124, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0082, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0082, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0085, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0084, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0104, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0113, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0088, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0210, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0120, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0127, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0126, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0130, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0154, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0080, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0115, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0110, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0115, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0137, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0121, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0128, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0116, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0080, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0127, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0121, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0093, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0133, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0150, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0094, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0146, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0098, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0133, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0214, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0090, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0152, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0179, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0145, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0181, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0148, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0129, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0099, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0080, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0082, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0092, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0103, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0129, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0229, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0137, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0115, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0114, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0101, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0095, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0132, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0187, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0109, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0143, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0105, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0082, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0080, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0090, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0089, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0099, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0093, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0110, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0088, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0114, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0194, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0155, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0176, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0168, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0157, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0200, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0137, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0182, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0167, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0110, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0110, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0096, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0144, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0112, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0127, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0081, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0109, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0140, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0117, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0102, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0123, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0187, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0110, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0114, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0116, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0081, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0163, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0158, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0180, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0098, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0201, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0119, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0091, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0095, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0098, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0148, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0106, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0107, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0167, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0210, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0125, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0103, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0120, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0131, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0194, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0125, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0137, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0086, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0137, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0151, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0097, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0096, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0111, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0080, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0101, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0156, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0102, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0138, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0167, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0099, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0163, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0117, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0095, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0114, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0102, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0139, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0087, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0107, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0168, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0131, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0134, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0108, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0123, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0113, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0089, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0102, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0121, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0198, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0024, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0112, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0098, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0090, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0109, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0114, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0115, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0136, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0105, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0093, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0090, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0089, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0081, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0115, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0097, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0134, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0100, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0213, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0125, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0103, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0126, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0162, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0094, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0123, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0092, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0129, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0107, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0106, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0181, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0101, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0121, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0097, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0146, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0132, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0133, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0083, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0144, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0088, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0086, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0121, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0098, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0100, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0114, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0201, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0162, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0088, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0112, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0117, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0150, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0166, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0144, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0134, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0144, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0146, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0116, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0137, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0124, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0110, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0083, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0097, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0120, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0080, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0158, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0156, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0151, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0105, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0095, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0133, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0101, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0143, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0115, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0189, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0133, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0112, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0147, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0154, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0155, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0144, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0084, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0101, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0103, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0081, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0116, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0168, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0098, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0145, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0233, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0090, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0093, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0137, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0087, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0105, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0152, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0185, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0097, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0121, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0094, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0091, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0123, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0132, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0127, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0138, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0143, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0103, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0096, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0113, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0114, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0083, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0101, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0104, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0090, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0083, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0090, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0103, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0107, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0175, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0081, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0176, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0179, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0143, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0197, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0120, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0089, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0198, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0126, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0106, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0191, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0094, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0160, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0137, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0129, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0134, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0111, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0107, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0122, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0100, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0143, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0093, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0127, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0093, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0107, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0174, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0087, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0117, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0186, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0112, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0142, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0120, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0130, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0080, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0161, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0183, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0135, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0167, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0160, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0113, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0229, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0146, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0142, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0150, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0124, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0081, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0120, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0142, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0175, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0097, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0122, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0109, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0107, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0114, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0155, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0090, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0134, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0135, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0116, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0095, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0119, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0137, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0093, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0113, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0083, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0125, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0080, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0108, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0095, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0089, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0107, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0178, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0113, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0088, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0152, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0139, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0134, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0084, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0092, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0107, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0090, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0143, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0103, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0081, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0204, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0085, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0134, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0102, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0102, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0111, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0154, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0121, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0173, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0097, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0112, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0131, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0108, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0086, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0090, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0124, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0094, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0115, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0177, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0121, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0150, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0106, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0107, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0162, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0084, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0120, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0178, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0128, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0144, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0189, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0099, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0130, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0145, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0126, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0093, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0099, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0128, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0089, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0105, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0142, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0104, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0092, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0081, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0087, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0092, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0208, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0084, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0088, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0109, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0092, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0141, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0094, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0150, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0130, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0088, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0178, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0108, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0100, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0160, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0120, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0143, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0197, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0092, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0109, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0136, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0085, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0123, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0084, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0083, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0099, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0085, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0131, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0163, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0108, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0177, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0132, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0102, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0126, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0141, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0129, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0139, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0152, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0145, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0081, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0100, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0094, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0088, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0099, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0102, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0135, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0124, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0133, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0086, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0113, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0084, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0148, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0085, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0104, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0170, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0147, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0081, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0194, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0110, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0092, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0123, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0106, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0098, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0113, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0107, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0169, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0129, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0081, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0101, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0114, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0090, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0120, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0182, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0094, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0107, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0092, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0107, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0103, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0159, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0106, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0185, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0115, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0086, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0117, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0101, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0097, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0080, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0175, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0102, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0149, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0177, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0087, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0128, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0097, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0157, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0081, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0204, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0274, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0141, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0101, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0187, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0205, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0136, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0110, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0101, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0194, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0117, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0091, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0119, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0097, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0096, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0181, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0106, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0083, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0212, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0123, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0085, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0189, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0094, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0141, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0122, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0177, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0105, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0145, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0112, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0100, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0165, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0140, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0128, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0148, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0098, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0176, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0136, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0106, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0098, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0100, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0095, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0121, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0095, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0090, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0123, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0093, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0116, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0024, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0093, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0129, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0094, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0160, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0106, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0097, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0112, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0160, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0137, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0099, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0089, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0098, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0119, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0080, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0110, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0088, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0088, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0132, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0123, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0136, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0087, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0106, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0100, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0105, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0139, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0084, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0158, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0141, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0152, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0130, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0119, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0118, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0123, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0182, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0163, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0110, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0118, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0127, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0110, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0099, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0134, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0104, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0120, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0081, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0113, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0173, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0116, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0115, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0168, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0139, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0104, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0118, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0178, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0083, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0099, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0111, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0106, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0107, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0082, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0121, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0168, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0093, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0188, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0140, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0129, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0105, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0091, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0093, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0251, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0086, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0090, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0107, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0164, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0097, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0092, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0147, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0123, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0084, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0121, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0089, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0097, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0084, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0099, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0124, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0087, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0084, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0171, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0158, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0120, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0147, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0084, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0080, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0120, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0083, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0110, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0103, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0162, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0104, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0124, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0127, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0089, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0146, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0099, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0130, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0175, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0102, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0098, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0137, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0090, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0159, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0103, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0097, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0245, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0152, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0135, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0166, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0158, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0082, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0080, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0116, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0191, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0104, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0107, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0090, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0130, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0114, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0115, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0116, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0108, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0082, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0022, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0102, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0089, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0161, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0107, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0127, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0131, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0097, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0109, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0135, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0096, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0124, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0092, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0160, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0174, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0093, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0143, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0147, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0110, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0088, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0096, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0116, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0081, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0088, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0080, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0112, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0105, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0087, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0117, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0214, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0092, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0103, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0085, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0147, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0113, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0123, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0122, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0121, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0112, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0115, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0136, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0111, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0117, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0119, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0112, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0143, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0113, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0080, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0081, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0093, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0100, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0108, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0098, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0099, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0147, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0125, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0130, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0151, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0164, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0121, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0082, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0095, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0102, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0143, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0081, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0103, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0106, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0084, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0161, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0096, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0121, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0091, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0146, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0103, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0111, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0080, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0206, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0097, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0112, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0114, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0134, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0093, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0158, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0154, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0097, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0118, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0103, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0176, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0191, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0083, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0082, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0126, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0090, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0019, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0124, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0145, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0095, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0156, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0210, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0097, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0097, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0097, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0129, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0093, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0124, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0103, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0143, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0214, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0107, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0167, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0118, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0093, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0113, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0118, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0206, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0132, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0088, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0097, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0153, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0150, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0094, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0111, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0119, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0105, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0090, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0099, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0178, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0138, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0083, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0082, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0080, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0124, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0091, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0088, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0081, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0081, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0109, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0133, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0110, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0122, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0128, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0094, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0094, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0085, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0083, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0112, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0147, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0133, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0143, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0243, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0105, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0115, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0094, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0123, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0113, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0092, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0144, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0086, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0101, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0120, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0122, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0113, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0084, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0109, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0115, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0132, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0125, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0114, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0157, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0089, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0143, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0105, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0168, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0122, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0148, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0114, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0162, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0189, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0119, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0117, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0109, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0163, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0185, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0137, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0154, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0181, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0159, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0123, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0127, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0081, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0103, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0109, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0089, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0233, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0102, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0106, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0110, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0134, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0094, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0183, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0092, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0116, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0085, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0087, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0120, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0105, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0133, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0092, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0117, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0180, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0107, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0165, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0180, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0163, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0082, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0147, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0112, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0119, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0110, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0146, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0143, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0092, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0100, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0126, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0128, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0080, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0144, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0104, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0186, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0098, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0120, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0085, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0095, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0096, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0154, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0088, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0098, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0119, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0211, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0126, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0082, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0216, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0085, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0152, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0128, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0084, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0097, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0129, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0106, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0116, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0164, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0137, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0209, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0144, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0123, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0131, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0091, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0083, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0193, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0159, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0150, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0124, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0094, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0164, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0136, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0098, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0089, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0113, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0125, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0085, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0080, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0128, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0083, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0110, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0162, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0085, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0131, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0137, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0157, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0086, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0084, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0118, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0131, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0095, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0110, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0129, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0100, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0106, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0174, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0093, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0134, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0125, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0158, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0115, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0162, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0116, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0125, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0120, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0111, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0123, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0166, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0114, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0122, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0084, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0106, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0136, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0176, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0086, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0115, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0124, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0137, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0130, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0182, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0178, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0189, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0199, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0109, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0094, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0147, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0091, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0102, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0143, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0091, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0184, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0086, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0128, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0149, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0170, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0094, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0090, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0113, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0156, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0120, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0088, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0099, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0088, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0094, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0216, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0086, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0105, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0170, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0124, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0084, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0142, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0083, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0124, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0106, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0109, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0165, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0143, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0142, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0130, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0134, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0092, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0080, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0101, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0096, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0083, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0121, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0148, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0104, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0107, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0163, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0304, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0132, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0149, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0117, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0090, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0175, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0085, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0114, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0123, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0234, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0129, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0103, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0105, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0145, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0155, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0113, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0122, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0117, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0207, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0097, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0096, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0080, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0093, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0080, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0153, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0151, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0182, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0168, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0143, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0113, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0095, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0130, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0131, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0103, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0083, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0083, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0123, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0203, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0123, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0188, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0135, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0152, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0155, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0120, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0088, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0094, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0087, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0090, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0122, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0086, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0089, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0090, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0162, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0119, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0083, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0122, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0119, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0104, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0206, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0100, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0106, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0099, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0128, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0081, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0127, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0137, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0170, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0096, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0128, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0126, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0131, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0126, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0147, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0107, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0123, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0099, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0094, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0128, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0117, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0091, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0137, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0118, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0160, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0081, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0114, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0162, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0127, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0127, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0131, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0152, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0121, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0117, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0153, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0106, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0186, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0120, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0160, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0118, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0161, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0112, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0113, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0089, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0117, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0126, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0127, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0119, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0084, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0088, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0139, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0109, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0083, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0087, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0081, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0083, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0081, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0168, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0102, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0089, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0092, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0095, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0086, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0115, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0082, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0152, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0109, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0144, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0135, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0109, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0175, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0135, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0125, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0125, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0154, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0081, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0085, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0106, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0121, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0115, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0158, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0108, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0096, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0133, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0209, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0085, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0115, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0162, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0151, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0131, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0154, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0105, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0090, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0088, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0091, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0098, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0154, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0090, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0138, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0125, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0124, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0100, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0232, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0128, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0183, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0093, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0129, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0165, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0138, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0086, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0145, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0098, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0090, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0121, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0084, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0082, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0125, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0119, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0107, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0150, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0109, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0112, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0151, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0094, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0083, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0143, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0123, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0139, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0106, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0094, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0146, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0103, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0111, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0105, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0140, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0160, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0124, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0106, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0108, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0081, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0119, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0087, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0146, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0151, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0101, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0164, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0159, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0143, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0088, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0149, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0093, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0097, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0099, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0130, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0097, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0094, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0125, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0097, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0166, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0087, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0131, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0115, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0133, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0110, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0114, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0204, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0202, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0091, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0150, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0114, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0148, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0081, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0098, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0112, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0119, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0129, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0128, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0145, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0167, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0092, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0239, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0187, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0101, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0084, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0127, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0097, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0124, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0109, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0112, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0089, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0099, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0112, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0106, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0098, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0155, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0111, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0100, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0095, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0155, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0161, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0101, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0086, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0152, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0111, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0095, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0121, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0107, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0105, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0100, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0107, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0112, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0091, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0136, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0110, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0156, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0168, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0165, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0082, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0089, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0085, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0109, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0165, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0090, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0141, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0170, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0111, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0140, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0125, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0145, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0114, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0097, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0134, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0105, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0163, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0105, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0098, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0128, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0134, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0129, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0194, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0177, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0094, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0134, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0130, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0142, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0022, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0081, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0211, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0185, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0110, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0171, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0167, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0101, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0099, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0132, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0093, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0081, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0091, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0147, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0081, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0141, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0145, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0135, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0188, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0080, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0132, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0161, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0086, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0108, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0268, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0095, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0139, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0161, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0133, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0096, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0080, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0175, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0118, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0164, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0111, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0118, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0092, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0120, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0160, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0080, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0136, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0145, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0111, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0126, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0133, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0145, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0118, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0110, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0206, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0098, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0098, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0113, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0104, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0139, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0169, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0080, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0119, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0088, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0139, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0101, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0138, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0115, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0186, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0155, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0083, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0117, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0088, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0093, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0145, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0125, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0080, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0111, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0105, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0128, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0100, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0128, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0082, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0111, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0083, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0122, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0108, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0096, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0190, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0136, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0108, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0193, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0117, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0120, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0089, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0199, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0181, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0182, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0145, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0103, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0093, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0099, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0082, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0189, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0133, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0104, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0099, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0101, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0114, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0147, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0131, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0093, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0081, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0149, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0083, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0174, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0113, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0114, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0192, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0164, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0084, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0143, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0097, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0145, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0159, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0083, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0114, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0120, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0106, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0105, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0120, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0129, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0140, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0243, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0153, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0106, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0114, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0134, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0095, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0159, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0098, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0132, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0122, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0080, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0024, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0101, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0106, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0128, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0134, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0122, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0147, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0168, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0193, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0151, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0093, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0127, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0084, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0090, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0099, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0143, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0108, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0106, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0173, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0102, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0103, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0180, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0197, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0088, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0122, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0133, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0091, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0133, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0085, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0160, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0113, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0112, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0116, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0099, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0121, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0089, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0085, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0109, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0017, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0083, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0129, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0120, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0083, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0132, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0139, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0081, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0081, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0136, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0124, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0098, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0085, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0088, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0149, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0097, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0092, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0116, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0127, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0139, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0104, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0163, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0101, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0095, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0133, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0097, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0082, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0105, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0095, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0095, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0193, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0108, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0130, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0170, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0086, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0083, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0089, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0201, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0148, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0133, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0119, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0123, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0131, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0114, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0102, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0160, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0117, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0217, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0091, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0092, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0105, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0157, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0090, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0143, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0097, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0111, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0114, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0103, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0142, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0178, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0089, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0085, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0084, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0108, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0088, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0138, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0111, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0101, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0161, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0179, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0110, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0098, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0123, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0100, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0092, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0119, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0121, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0106, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0110, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0098, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0101, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0112, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0088, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0150, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0152, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0086, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0083, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0108, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0082, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0099, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0097, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0103, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0088, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0090, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0019, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0156, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0152, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0133, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0118, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0154, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0216, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0120, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0130, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0098, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0180, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0137, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0112, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0121, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0119, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0160, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0083, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0144, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0154, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0098, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0127, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0115, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0155, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0142, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0089, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0101, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0157, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0154, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0103, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0096, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0172, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0160, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0131, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0094, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0128, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0087, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0123, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0113, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0150, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0120, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0107, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0131, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0104, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0103, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0140, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0202, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0127, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0142, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0102, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0186, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0186, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0173, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0170, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0143, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0118, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0090, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0102, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0094, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0198, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0107, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0216, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0147, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0150, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0099, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0094, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0107, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0111, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0136, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0087, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0099, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0156, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0104, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0101, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0162, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0186, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0135, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0095, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0110, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0100, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0087, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0128, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0087, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0084, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0105, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0080, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0160, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0083, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0114, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0091, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0118, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0188, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0200, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0142, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0101, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0106, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0133, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0098, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0179, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0135, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0091, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0083, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0102, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0108, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0107, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0150, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0100, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0116, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0092, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0082, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0173, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0126, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0120, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0105, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0144, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0097, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0139, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0131, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0113, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0091, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0156, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0097, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0158, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0165, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0092, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0130, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0128, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0113, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0174, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0124, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0087, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0104, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0092, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0087, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0105, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0082, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0144, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0096, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0154, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0144, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0098, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0161, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0083, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0103, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0118, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0097, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0080, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0089, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0087, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0088, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0099, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0122, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0155, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0094, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0109, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0103, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0096, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0083, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0089, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0091, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0080, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0084, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0085, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0127, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0080, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0214, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0134, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0099, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0140, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0113, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0124, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0168, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0129, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0107, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0122, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0131, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0113, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0101, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0120, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0132, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0102, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0113, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0111, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0106, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0114, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0184, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0087, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0110, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0192, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0189, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0125, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0098, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0186, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0133, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0130, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0109, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0131, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0116, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0111, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0158, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0135, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0178, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0167, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0127, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0109, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0159, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0099, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0156, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0188, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0115, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0160, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0099, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0135, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0110, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0133, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0136, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0159, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0115, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0122, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0160, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0098, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0086, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0107, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0165, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0084, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0085, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0159, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0106, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0133, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0164, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0119, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0093, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0103, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0115, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0112, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0140, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0098, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0170, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0088, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0083, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0088, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0116, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0181, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0141, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0106, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0137, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0084, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0100, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0119, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0083, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0106, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0089, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0110, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0111, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0187, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0085, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0085, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0136, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0097, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0107, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0094, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0112, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0088, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0093, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0112, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0097, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0081, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0087, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0092, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0103, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0097, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0115, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0125, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0149, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0100, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0139, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0081, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0081, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0116, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0135, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0101, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0102, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0138, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0157, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0082, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0108, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0116, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0124, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0098, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0085, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0080, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0113, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0022, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0084, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0124, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0083, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0086, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0094, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0127, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0104, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0110, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0149, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0112, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0123, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0082, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0122, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0084, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0166, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0022, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0081, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0225, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0091, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0140, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0099, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0165, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0108, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0099, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0208, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0091, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0206, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0105, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0147, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0090, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0115, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0109, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0143, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0108, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0104, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0081, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0104, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0082, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0129, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0127, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0115, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0115, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0104, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0158, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0098, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0150, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0189, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0089, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0109, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0084, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0193, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0153, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0110, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0084, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0080, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0089, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0092, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0121, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0086, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0086, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0106, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0101, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0081, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0163, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0093, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0139, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0101, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0102, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0113, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0130, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0116, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0084, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0085, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0089, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0162, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0115, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0139, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0083, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0096, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0122, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0096, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0141, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0120, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0133, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0127, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0104, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0129, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0091, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0113, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0149, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0086, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0134, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0087, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0091, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0147, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0088, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0104, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0095, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0120, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0101, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0147, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0112, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0096, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0191, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0108, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0146, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0082, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0144, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0142, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0081, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0107, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0098, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0109, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0125, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0015, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0110, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0120, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0117, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0111, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0088, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0101, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0120, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0087, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0099, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0096, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0112, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0121, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0112, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0095, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0096, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0177, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0198, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0096, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0122, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0131, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0105, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0119, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0143, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0091, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0136, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0154, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0137, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0098, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0114, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0110, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0111, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0111, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0118, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0111, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0090, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0087, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0092, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0086, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0120, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0084, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0088, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0177, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0099, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0156, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0135, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0088, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0114, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0095, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0123, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0143, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0133, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0081, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0092, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0102, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0122, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0134, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0084, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0112, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0107, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0104, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0089, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0088, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0138, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0155, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0111, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0087, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0109, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0133, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0092, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0135, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0127, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0129, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0164, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0126, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0086, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0097, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0082, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0089, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0084, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0097, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0188, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0109, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0099, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0081, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0140, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0116, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0157, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0117, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0132, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0128, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0129, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0165, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0120, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0087, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0087, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0089, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0149, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0089, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0165, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0144, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0098, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0090, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0152, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0118, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0088, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0087, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0119, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0085, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0121, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0145, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0080, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0102, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0183, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0120, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0128, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0087, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0147, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0084, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0082, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0185, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0103, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0087, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0166, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0135, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0083, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0105, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0024, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0176, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0150, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0087, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0083, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0129, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0105, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0081, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0091, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0095, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0089, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0189, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0096, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0130, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0174, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0105, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0100, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0141, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0101, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0110, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0111, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0089, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0109, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0124, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0122, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0097, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0123, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0186, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0155, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0099, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0118, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0128, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0124, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0120, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0096, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0118, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0124, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0126, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0084, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0110, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0161, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0105, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0088, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0090, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0105, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0083, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0110, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0132, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0161, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0107, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0194, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0120, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0087, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0109, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0090, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0085, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0179, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0086, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0106, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0102, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0145, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0099, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0163, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0122, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0112, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0081, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0116, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0117, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0093, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0129, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0178, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0087, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0303, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0098, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0113, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0090, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0130, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0080, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0107, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0110, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0094, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0182, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0200, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0112, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0083, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0122, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0093, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0085, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0085, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0089, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0115, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0115, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0104, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0084, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0098, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0106, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0144, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0104, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0093, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0151, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0142, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0080, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0159, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0135, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0130, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0105, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0102, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0168, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0098, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0110, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0119, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0130, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0089, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0119, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0154, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0111, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0080, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0146, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0105, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0129, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0097, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0111, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0081, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0117, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0082, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0136, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0104, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0089, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0087, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0145, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0084, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0105, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0092, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0113, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0184, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0103, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0110, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0156, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0100, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0087, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0084, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0132, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0139, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0104, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0092, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0138, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0134, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0162, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0137, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0146, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0083, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0101, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0098, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0114, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0088, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0088, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0094, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0086, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0144, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0121, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0144, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0095, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0117, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0082, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0080, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0080, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0111, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0193, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0160, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0130, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0148, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0120, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0080, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0083, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0101, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0081, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0101, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0126, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0086, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0089, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0143, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0098, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0171, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0141, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0118, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0105, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0092, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0118, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0095, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0102, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0110, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0123, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0112, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0119, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0135, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0129, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0084, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0117, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0088, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0162, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0105, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0098, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0134, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0108, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0092, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0081, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0119, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0082, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0123, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0115, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0108, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0090, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0159, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0103, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0111, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0022, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0142, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0096, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0091, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0100, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0130, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0100, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0092, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0085, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0105, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0082, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0121, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0081, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0157, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0121, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0136, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0084, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0101, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0137, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0098, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0087, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0090, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0094, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0112, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0118, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0169, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0111, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0159, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0225, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0085, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0092, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0113, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0104, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0084, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0110, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0099, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0087, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0146, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0130, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0107, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0092, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0125, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0119, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0081, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0112, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0147, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0224, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0151, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0086, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0163, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0093, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0097, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0140, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0113, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0115, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0126, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0095, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0159, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0092, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0023, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0170, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0103, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0092, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0118, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0082, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0161, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0098, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0102, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0088, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0090, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0104, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0100, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0143, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0087, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0088, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0123, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0090, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0143, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0187, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0113, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0094, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0088, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0084, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0106, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0108, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0107, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0119, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0097, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0098, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0155, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0090, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0154, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0097, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0080, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0122, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0111, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0102, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0094, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0092, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0112, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0131, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0103, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0091, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0083, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0198, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0151, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0116, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0117, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0097, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0109, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0130, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0085, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0129, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0130, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0145, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0090, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0124, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0096, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0113, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0203, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0082, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0120, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0081, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0085, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0169, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0085, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0092, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0094, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0024, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0096, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0137, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0083, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0112, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0112, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0084, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0083, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0130, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0132, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0105, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0117, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0085, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0086, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0080, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0174, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0165, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0137, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0130, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0131, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0114, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0081, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0136, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0080, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0087, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0179, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0098, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0142, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0138, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0125, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0097, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0088, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0126, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0130, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0016, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0137, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0113, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0094, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0146, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0021, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0147, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0164, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0020, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0155, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0082, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0080, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0090, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0098, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0097, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0114, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0093, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0087, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0122, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0148, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0148, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0136, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0095, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0124, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0107, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0135, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0100, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0154, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0086, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0099, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0135, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0100, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0128, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0136, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0152, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0144, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0092, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0167, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0116, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0119, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0118, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0089, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0097, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0108, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0108, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0090, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0102, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0118, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0106, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0096, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0140, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0100, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0094, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0099, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0168, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0083, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0102, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0158, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0096, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0137, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0094, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0165, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0139, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0189, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0089, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0109, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0080, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0106, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0084, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0138, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0122, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0109, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0151, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0097, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0081, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0108, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0080, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0111, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0151, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0087, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0097, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0150, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0103, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0144, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0112, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0127, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0093, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0128, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0094, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0137, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0086, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0109, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0082, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0119, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0183, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0080, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0109, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0099, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0144, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0093, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0148, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0165, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0105, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0140, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0134, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0099, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0113, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0084, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0104, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0120, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0094, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0103, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0014, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0081, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0126, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0097, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0132, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0083, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0164, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0132, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0097, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0111, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0113, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0085, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0081, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0135, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0096, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0090, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0105, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0180, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0103, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0116, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0021, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0136, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0108, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0105, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0093, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0103, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0136, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0158, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0154, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0107, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0143, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0098, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0086, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0099, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0087, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0102, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0080, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0113, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0114, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0089, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0100, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0088, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0142, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0165, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0123, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0080, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0101, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0113, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0139, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0130, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0080, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0080, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0108, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0163, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0164, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0132, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0115, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0142, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0101, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0125, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0160, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0136, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0142, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0098, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0101, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0130, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0103, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0098, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0082, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0080, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0102, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0154, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0091, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0132, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0102, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0109, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0138, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0117, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0175, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0086, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0147, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0132, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0107, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0090, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0080, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0097, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0116, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0097, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0087, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0156, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0129, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0121, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0090, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0101, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0106, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0104, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0101, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0107, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0114, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0082, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0096, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0096, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0082, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0177, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0089, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0113, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0087, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0098, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0090, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0100, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0131, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0201, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0138, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0192, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0113, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0125, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0154, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0117, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0102, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0093, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0098, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0157, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0084, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0125, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0144, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0080, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0080, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0107, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0106, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0108, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0132, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0133, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0107, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0159, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0087, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0012, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0012, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0012, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0012, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0011, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0011, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0011, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0011, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0011, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0011, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0010, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0010, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0010, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0009, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0009, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0009, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0009, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0009, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0018, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0093, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0172, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0126, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0129, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0087, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0146, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0090, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0105, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0099, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0087, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0138, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0108, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0085, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0138, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0134, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0108, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0109, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0080, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0161, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0143, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0093, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0090, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0113, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0139, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0099, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0109, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0095, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0103, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0112, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0134, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0108, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0081, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0107, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0080, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0081, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0093, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0084, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0103, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0164, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0118, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0120, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0104, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0129, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0138, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0092, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0085, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0099, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0080, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0084, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0092, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0094, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0083, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0082, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0106, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0102, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0087, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0083, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0128, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0114, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0127, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0136, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0131, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0134, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0118, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0147, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0088, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0135, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0080, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0102, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0084, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0123, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0116, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0113, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0112, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0155, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0117, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0132, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0115, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0148, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0082, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0106, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0080, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0101, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0092, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0119, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0096, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0123, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0124, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0123, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0123, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0090, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0146, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0098, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0091, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0126, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0091, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0099, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0122, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0124, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0086, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0144, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0101, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0148, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0087, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0101, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0141, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0095, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0080, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0206, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0111, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0148, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0129, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0085, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0127, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0219, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0119, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0023, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0090, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0082, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0100, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0106, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0085, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0117, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0090, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0120, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0105, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0134, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0084, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0135, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0107, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0100, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0106, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0111, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0082, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0148, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0095, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0088, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0084, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0124, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0081, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0117, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0097, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0096, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0199, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0113, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0156, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0087, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0085, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0193, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0143, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0080, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0100, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0136, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0082, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0114, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0099, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0084, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0117, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0099, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0103, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0099, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0122, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0085, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0234, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0144, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0081, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0135, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0126, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0133, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0081, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0102, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0098, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0132, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0120, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0093, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0230, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0172, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0109, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0114, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0099, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0096, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0093, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0095, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0086, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0089, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0109, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0083, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0098, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0122, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0022, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0126, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0106, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0140, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0113, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0088, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0097, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0080, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0095, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0118, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0154, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0081, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0099, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0097, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0098, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0097, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0087, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0097, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0084, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0101, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0088, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0177, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0095, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0150, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0110, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0092, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0173, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0131, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0100, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0093, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0100, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0142, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0090, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0124, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0083, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0117, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0080, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0088, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0152, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0080, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0104, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0106, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0129, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0100, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0099, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0182, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0118, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0106, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0117, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0112, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0086, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0090, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0105, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0134, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0106, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0095, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0090, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0090, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0106, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0082, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0126, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0121, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0103, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0110, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0092, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0163, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0116, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0142, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0111, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0085, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0081, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0105, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0088, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0129, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0091, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0124, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0087, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0173, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0085, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0123, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0099, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0086, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0106, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0081, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0095, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0090, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0130, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0095, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0111, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0118, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0091, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0173, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0118, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0105, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0087, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0154, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0088, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0123, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0112, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0216, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0112, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0120, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0131, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0087, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0174, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0192, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0117, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0145, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0083, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0096, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0128, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0020, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0128, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0115, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0114, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0108, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0080, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0084, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0114, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0087, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0133, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0117, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0081, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0088, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0125, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0143, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0119, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0082, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0132, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0102, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0126, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0109, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0120, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0097, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0117, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0093, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0088, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0121, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0192, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0090, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0083, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0137, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0093, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0140, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0102, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0160, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0102, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0121, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0103, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0135, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0103, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0081, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0122, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0091, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0024, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0108, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0090, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0019, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0171, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0097, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0085, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0107, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0088, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0091, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0110, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0081, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0090, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0087, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0116, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0114, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0145, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0097, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0084, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0103, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0091, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0108, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0153, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0148, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0080, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0094, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0127, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0158, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0090, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0119, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0090, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0137, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0093, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0143, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0132, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0129, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0084, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0080, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0187, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0093, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0082, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0099, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0141, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0111, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0095, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0087, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0111, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0136, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0138, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0111, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0111, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0086, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0216, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0121, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0087, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0171, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0122, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0116, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0188, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0016, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0082, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0114, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0089, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0178, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0163, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0139, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0117, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0094, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0238, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0107, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0128, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0098, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0127, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0127, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0110, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0102, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0096, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0101, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0143, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0116, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0128, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0096, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0088, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0090, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0097, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0098, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0122, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0142, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0128, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0108, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0099, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0084, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0143, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0090, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0086, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0119, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0096, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0091, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0113, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0086, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0107, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0086, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0153, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0176, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0142, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0102, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0119, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0122, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0080, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0090, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0154, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0145, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0093, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0087, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0100, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0092, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0113, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0093, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0100, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0134, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0017, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0119, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0096, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0133, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0088, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0016, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0110, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0083, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0114, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0124, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0110, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0119, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0104, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0120, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0137, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0104, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0098, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0083, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0144, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0085, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0187, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0111, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0126, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0123, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0100, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0192, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0129, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0113, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0083, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0109, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0106, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0085, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0093, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0117, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0099, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0129, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0089, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0153, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0095, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0116, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0104, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0104, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0095, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0093, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0151, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0082, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0096, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0107, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0100, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0099, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0021, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0098, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0162, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0123, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0137, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0120, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0099, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0110, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0100, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0087, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0117, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0149, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0102, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0080, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0103, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0021, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0120, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0128, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0162, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0116, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0086, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0109, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0103, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0094, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0083, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0085, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0116, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0095, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0129, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0103, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0109, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0133, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0115, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0106, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0127, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0108, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0107, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0116, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0148, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0139, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0116, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0127, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0081, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0121, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0117, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0097, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0119, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0080, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0103, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0106, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0174, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0102, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0123, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0084, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0088, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0088, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0134, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0101, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0101, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0113, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0099, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0122, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0106, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0170, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0084, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0120, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0080, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0133, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0147, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0089, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0135, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0143, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0088, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0133, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0107, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0115, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0128, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0126, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0084, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0086, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0102, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0109, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0096, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0098, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0102, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0174, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0167, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0108, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0088, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0094, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0135, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0087, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0091, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0146, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0090, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0102, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0082, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0107, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0082, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0099, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0111, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0111, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0100, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0183, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0108, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0021, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0086, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0082, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0163, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0133, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0135, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0086, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0107, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0105, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0107, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0121, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0118, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0151, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0087, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0095, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0123, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0081, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0131, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0112, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0095, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0099, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0090, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0084, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0148, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0109, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0092, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0090, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0115, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0083, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0081, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0081, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0148, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0131, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0135, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0128, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0126, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0114, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0023, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0203, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0087, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0140, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0110, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0083, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0089, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0129, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0163, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0082, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0117, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0104, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0162, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0139, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0096, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0095, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0146, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0143, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0211, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0096, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0144, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0108, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0080, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0098, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0109, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0104, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0144, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0138, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0091, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0159, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0080, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0111, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0122, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0144, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0094, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0087, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0117, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0137, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0178, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0153, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0155, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0084, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0105, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0135, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0086, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0022, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0104, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0134, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0098, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0129, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0125, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0161, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0125, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0104, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0142, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0151, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0092, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0108, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0088, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0089, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0108, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0091, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0119, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0129, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0103, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0112, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0114, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0101, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0127, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0092, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0161, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0085, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0111, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0097, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0092, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0114, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0098, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0106, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0127, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0126, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0084, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0095, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0108, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0140, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0143, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0097, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0205, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0085, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0167, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0093, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0090, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0105, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0153, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0146, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0093, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0119, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0106, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0095, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0101, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0088, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0085, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0099, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0141, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0088, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0132, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0105, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0093, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0082, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0082, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0123, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0125, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0091, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0103, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0158, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0146, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0150, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0094, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0125, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0126, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0178, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0024, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0080, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0119, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0129, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0101, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0113, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0131, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0131, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0248, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0154, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0142, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0127, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0108, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0151, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0124, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0082, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0105, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0109, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0123, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0120, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0105, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0199, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0125, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0140, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0137, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0114, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0168, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0022, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0148, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0084, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0147, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0139, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0100, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0123, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0086, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0112, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0091, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0088, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0126, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0101, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0119, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0192, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0129, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0116, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0106, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0165, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0091, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0167, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0142, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0107, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0095, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0110, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0096, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0111, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0126, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0098, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0082, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0148, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0118, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0096, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0117, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0087, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0081, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0090, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0141, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0116, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0112, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0133, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0099, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0022, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0093, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0131, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0101, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0082, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0090, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0097, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0088, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0101, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0103, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0100, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0120, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0147, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0090, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0099, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0082, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0115, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0087, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0155, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0091, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0111, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0163, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0181, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0089, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0162, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0137, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0112, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0082, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0169, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0096, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0108, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0092, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0125, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0084, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0104, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0090, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0161, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0087, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0143, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0107, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0204, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0231, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0125, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0106, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0091, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0089, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0107, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0122, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0127, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0092, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0103, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0104, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0126, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0109, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0084, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0119, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0090, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0100, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0122, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0188, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0111, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0137, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0131, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0108, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0091, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0091, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0093, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0099, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0136, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0125, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0094, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0086, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0090, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0103, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0136, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0086, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0097, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0083, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0103, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0081, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0235, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0094, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0100, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0100, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0095, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0112, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0105, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0101, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0084, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0146, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0087, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0107, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0092, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0113, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0081, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0089, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0084, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0098, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0121, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0092, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0097, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0095, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0130, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0093, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0085, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0090, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0084, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0119, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0139, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0102, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0204, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0107, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0088, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0109, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0087, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0100, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0175, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0158, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0093, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0085, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0193, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0100, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0101, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0111, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0151, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0241, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0092, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0147, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0132, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0100, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0147, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0082, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0117, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0081, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0115, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0228, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0129, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0139, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0146, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0092, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0250, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0096, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0083, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0171, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0089, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0099, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0100, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0090, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0107, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0205, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0108, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0086, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0113, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0090, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0154, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0117, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0084, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0159, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0140, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0123, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0084, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0092, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0093, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0086, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0105, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0094, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0093, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0146, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0091, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0091, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0134, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0117, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0198, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0145, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0113, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0114, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0087, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0090, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0113, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0081, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0165, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0163, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0138, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0086, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0111, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0020, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0117, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0081, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0130, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0088, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0111, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0113, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0172, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0097, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0160, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0170, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0107, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0131, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0091, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0090, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0082, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0081, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0090, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0104, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0091, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0147, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0095, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0113, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0107, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0084, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0120, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0150, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0085, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0141, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0153, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0083, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0153, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0129, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0104, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0133, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0146, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0091, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0159, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0089, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0187, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0151, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0107, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0148, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0090, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0084, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0084, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0090, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0106, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0110, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0173, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0151, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0226, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0172, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0160, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0186, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0105, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0148, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0106, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0093, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0107, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0133, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0159, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0121, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0117, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0127, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0099, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0111, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0189, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0087, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0101, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0139, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0096, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0088, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0109, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0188, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0153, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0151, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0266, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0143, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0210, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0183, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0110, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0151, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0117, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0096, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0100, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0093, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0111, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0082, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0124, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0127, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0138, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0130, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0144, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0134, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0127, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0090, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0016, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0015, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0149, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0111, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0088, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0187, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0092, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0087, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0151, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0095, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0136, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0080, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0091, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0123, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0080, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0022, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0090, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0086, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0115, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0119, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0092, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0135, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0080, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0134, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0096, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0099, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0123, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0104, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0091, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0086, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Saving model at iteration: 20000
Loss: tensor(0.0170, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0139, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0112, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0212, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0128, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0087, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0080, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0081, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0097, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0082, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0086, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0126, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0183, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0119, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0085, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0095, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0170, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0129, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0137, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0115, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0091, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0097, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0150, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0103, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0102, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0168, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0101, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0114, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0106, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0084, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0135, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0086, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0095, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0097, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0126, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0166, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0137, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0099, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0080, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0095, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0109, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0128, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0080, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0116, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0152, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0127, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0097, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0121, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0130, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0137, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0081, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0107, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0092, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0094, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0088, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0087, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0116, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0112, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0118, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0082, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0145, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0146, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0098, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0096, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0105, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0130, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0121, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0131, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0116, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0167, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0115, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0102, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0102, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0123, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0083, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0106, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0097, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0121, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0129, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0125, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0119, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0086, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0142, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0120, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0115, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0094, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0086, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0024, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0095, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0082, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0116, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0118, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0083, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0139, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0175, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0102, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0022, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0127, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0113, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0125, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0081, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0096, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0180, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0086, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0179, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0176, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0094, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0110, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0102, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0110, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0091, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0097, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0103, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0119, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0097, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0089, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0082, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0086, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0119, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0093, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0087, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0090, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0142, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0157, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0107, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0091, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0197, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0104, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0089, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0096, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0088, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0096, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0094, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0140, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0164, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0082, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0177, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0116, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0096, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0101, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0121, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0081, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0086, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0085, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0102, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0103, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0102, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0100, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0103, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0094, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0141, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0112, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0136, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0124, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0108, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0130, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0090, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0100, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0158, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0082, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0101, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0104, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0101, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0091, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0108, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0120, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0082, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0098, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0104, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0149, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0090, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0100, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0166, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0085, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0140, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0105, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0133, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0083, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0107, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0089, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0096, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0123, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0132, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0112, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0089, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0091, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0142, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0083, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0108, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0108, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0107, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0117, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0118, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0128, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0176, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0084, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0109, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0122, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0100, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0095, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0113, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0160, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0094, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0125, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0120, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0122, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0096, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0113, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0153, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0094, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0105, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0118, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0099, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0092, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0081, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0108, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0107, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0142, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0118, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0217, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0086, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0094, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0131, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0140, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0106, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0093, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0115, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0123, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0136, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0123, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0156, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0203, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0108, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0081, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0133, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0123, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0089, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0099, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0085, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0086, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0108, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0087, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0097, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0102, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0100, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0104, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0139, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0081, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0158, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0099, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0083, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0116, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0085, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0085, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0143, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0105, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0154, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0127, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0141, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0080, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0123, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0080, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0107, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0190, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0092, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0106, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0099, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0181, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0090, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0098, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0114, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0081, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0095, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0144, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0159, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0128, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0097, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0104, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0174, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0081, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0195, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0170, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0106, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0108, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0137, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0139, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0097, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0083, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0092, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0117, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0092, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0254, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0173, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0156, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0207, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0080, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0153, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0098, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0104, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0087, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0088, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0137, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0098, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0081, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0115, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0141, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0099, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0085, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0124, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0112, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0103, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0088, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0146, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0103, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0117, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0171, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0222, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0090, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0088, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0111, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0253, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0137, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0108, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0115, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0104, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0140, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0106, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0153, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0085, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0085, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0084, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0138, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0126, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0110, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0102, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0090, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0098, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0099, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0117, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0080, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0103, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0092, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0081, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0117, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0105, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0093, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0178, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0103, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0139, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0125, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0111, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0096, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0127, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0159, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0099, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0095, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0095, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0118, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0090, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0090, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0088, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0094, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0082, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0127, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0091, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0083, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0097, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0116, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0088, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0111, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0120, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0086, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0165, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0152, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0087, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0105, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0127, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0091, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0129, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0087, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0095, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0123, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0086, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0113, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0114, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0093, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0124, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0140, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0091, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0104, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0112, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0123, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0098, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0085, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0101, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0112, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0083, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0102, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0126, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0095, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0107, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0115, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0097, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0087, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0170, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0085, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0143, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0090, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0110, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0131, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0109, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0121, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0134, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0135, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0102, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0131, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0125, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0143, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0097, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0114, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0117, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0175, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0145, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0102, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0114, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0105, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0108, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0098, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0082, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0089, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0155, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0086, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0098, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0091, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0145, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0082, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0094, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0132, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0126, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0089, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0170, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0116, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0096, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0105, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0144, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0111, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0082, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0115, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0082, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0154, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0080, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0104, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0094, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0121, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0085, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0102, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0083, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0111, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0082, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0089, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0110, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0130, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0106, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0113, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0134, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0152, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0130, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0080, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0097, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0134, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0116, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0090, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0090, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0099, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0086, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0095, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0104, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0089, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0119, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0089, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0083, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0092, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0177, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0108, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0122, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0122, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0167, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0088, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0172, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0099, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0095, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0111, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0104, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0111, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0126, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0098, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0082, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0166, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0087, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0085, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0081, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0080, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0142, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0224, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0143, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0090, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0126, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0097, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0092, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0134, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0084, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0114, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0128, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0085, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0116, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0189, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0154, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0143, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0153, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0083, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0094, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0102, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0081, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0117, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0126, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0087, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0135, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0154, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0116, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0117, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0168, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0129, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0092, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0117, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0108, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0128, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0112, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0024, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0087, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0110, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0109, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0127, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0091, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0103, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0121, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0095, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0123, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0090, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0128, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0107, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0105, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0111, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0089, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0127, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0130, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0156, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0105, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0085, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0120, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0083, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0125, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0105, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0129, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0089, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0092, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0081, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0107, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0132, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0101, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0088, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0114, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0092, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0113, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0104, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0134, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0134, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0143, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0085, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0176, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0096, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0086, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0104, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0099, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0086, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0103, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0129, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0146, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0127, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0099, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0161, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0091, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0087, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0082, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0102, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0087, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0108, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0114, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0091, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0096, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0096, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0093, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0091, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0124, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0082, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0160, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0128, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0141, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0127, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0080, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0104, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0093, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0088, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0083, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0133, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0090, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0124, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0118, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0095, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0099, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0114, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0105, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0149, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0153, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0096, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0107, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0083, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0105, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0091, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0122, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0085, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0155, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0133, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0112, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0083, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0137, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0091, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0085, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0132, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0131, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0093, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0155, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0087, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0104, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0080, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0145, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0088, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0104, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0082, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0105, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0096, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0093, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0164, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0156, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0086, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0169, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0097, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0149, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0090, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0190, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0116, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0129, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0086, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0108, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0113, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0098, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0092, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0120, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0135, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0222, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0103, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0091, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0082, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0138, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0097, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0101, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0145, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0162, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0083, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0179, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0093, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0080, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0087, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0127, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0123, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0107, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0116, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0082, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0102, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0096, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0124, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0104, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0091, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0117, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0094, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0082, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0123, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0081, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0086, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0085, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0019, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0115, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0164, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0125, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0081, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0115, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0143, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0087, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0082, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0113, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0186, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0138, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0087, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0171, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0085, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0085, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0103, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0108, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0093, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0209, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0085, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0089, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0149, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0133, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0114, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0129, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0097, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0081, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0091, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0097, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0091, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0097, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0153, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0133, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0104, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0111, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0105, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0086, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0175, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0082, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0086, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0147, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0135, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0165, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0140, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0083, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0093, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0169, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0113, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0177, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0102, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0096, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0130, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0080, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0143, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0106, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0086, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0097, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0097, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0093, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0107, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0110, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0087, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0090, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0092, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0140, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0137, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0090, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0102, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0159, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0135, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0111, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0084, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0086, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0101, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0092, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0155, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0191, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0103, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0097, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0091, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0101, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0111, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0119, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0123, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0016, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0093, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0114, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0116, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0181, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0111, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0087, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0086, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0085, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0118, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0089, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0115, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0096, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0118, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0086, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0135, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0165, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0105, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0132, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0198, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0103, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0020, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0139, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0102, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0085, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0148, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0107, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0098, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0162, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0206, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0088, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0090, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0099, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0133, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0140, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0089, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0175, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0142, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0101, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0082, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0148, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0130, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0156, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0117, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0095, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0087, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0115, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0176, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0199, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0091, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0097, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0120, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0093, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0219, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0138, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0107, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0142, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0178, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0090, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0085, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0164, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0084, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0083, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0172, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0111, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0082, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0113, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0124, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0084, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0158, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0015, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0089, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0083, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0092, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0097, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0097, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0193, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0095, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0131, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0157, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0084, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0080, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0117, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0096, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0119, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0087, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0086, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0110, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0093, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0161, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0160, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0112, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0144, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0113, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0166, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0130, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0096, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0110, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0160, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0115, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0171, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0119, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0088, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0081, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0089, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0082, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0088, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0188, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0183, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0112, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0180, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0139, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0114, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0013, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0110, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0095, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0118, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0134, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0104, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0086, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0113, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0104, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0139, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0096, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0133, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0017, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0138, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0008, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0096, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0132, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0177, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0158, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0189, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0121, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0100, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0110, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0119, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0088, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0088, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0131, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0101, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0140, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0183, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0145, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0090, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0152, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0116, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0114, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0119, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0151, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0081, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0119, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0105, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0120, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0090, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0133, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0125, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0089, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0109, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0104, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0122, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0095, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0091, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0094, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0089, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0121, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0091, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0091, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0179, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0125, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0086, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0122, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0098, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0083, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0093, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0104, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0143, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0099, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0178, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0086, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0102, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0095, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0084, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0089, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0088, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0099, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0148, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0097, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0092, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0090, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0096, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0181, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0138, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0080, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0097, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0111, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0085, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0121, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0111, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0120, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0114, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0111, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0104, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0182, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0098, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0158, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0102, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0083, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0096, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0083, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0084, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0162, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0166, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0091, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0101, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0083, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0108, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0136, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0095, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0178, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0139, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0109, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0090, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0141, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0130, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0082, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0086, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0122, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0093, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0137, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0086, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0087, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0121, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0150, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0082, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0088, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0101, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0134, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0138, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0107, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0129, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0102, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0177, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0145, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0101, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0128, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0080, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0096, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0116, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0085, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0098, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0094, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0185, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0090, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0140, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0092, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0084, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0128, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0121, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0139, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0094, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0129, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0117, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0109, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0107, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0085, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0080, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0082, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0176, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0103, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0190, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0093, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0082, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0005, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0119, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0091, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0081, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0148, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0115, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0192, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0102, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0110, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0095, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0116, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0103, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0103, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0101, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0154, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0108, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0157, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0084, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0092, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0092, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0104, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0118, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0121, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0093, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0132, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0103, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0152, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0083, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0083, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0081, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0081, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0119, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0127, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0131, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0124, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0118, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0092, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0106, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0084, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0165, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0106, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0148, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0190, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0091, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0160, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0088, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0134, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0110, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0124, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0089, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0199, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0088, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0084, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0087, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0092, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0080, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0090, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0196, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0080, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0085, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0088, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0083, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0100, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0090, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0109, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0083, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0113, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0123, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0200, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0107, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0111, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0084, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0102, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0088, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0100, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0110, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0164, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0122, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0093, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0109, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0104, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0087, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0091, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0090, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0096, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0114, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0102, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0081, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0097, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0098, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0107, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0091, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0147, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0100, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0088, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0110, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0117, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0111, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0105, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0080, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0103, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0091, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0101, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0084, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0121, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0023, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0099, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0082, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0124, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0111, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0119, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0099, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0166, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0150, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0128, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0085, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0113, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0146, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0120, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0084, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0085, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0159, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0112, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0136, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0095, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0082, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0085, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0128, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0127, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0109, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0104, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0080, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0116, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0019, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0087, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0091, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0083, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0126, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0080, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0081, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0097, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0134, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0156, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0142, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0131, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0159, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0089, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0080, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0108, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0128, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0099, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0167, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0153, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0101, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0139, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0080, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0112, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0140, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0097, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0082, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0094, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0080, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0100, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0080, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0141, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0111, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0159, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0096, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0089, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0093, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0119, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0130, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0108, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0092, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0120, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0111, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0117, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0147, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0094, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0120, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0148, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0092, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0128, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0114, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0080, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0122, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0093, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0139, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0132, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0103, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0109, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0113, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0083, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0139, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0087, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0091, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0083, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0083, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0083, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0170, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0099, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0091, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0098, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0158, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0113, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0142, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0119, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0090, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0096, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0151, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0120, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0102, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0120, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0093, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0109, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0087, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0021, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0110, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0083, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0083, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0083, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0123, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0091, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0093, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0130, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0089, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0252, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0123, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0138, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0107, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0100, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0093, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0094, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0082, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0122, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0140, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0088, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0100, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0080, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0084, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0097, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0117, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0097, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0135, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0107, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0117, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0106, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0111, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0084, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0089, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0167, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0109, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0094, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0085, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0169, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0091, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0095, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0169, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0136, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0113, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0095, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0093, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0112, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0081, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0085, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0087, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0140, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0165, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0170, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0092, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0191, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0091, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0087, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0114, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0093, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0113, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0096, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0106, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0082, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0182, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0129, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0135, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0080, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0145, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0114, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0147, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0092, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0129, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0098, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0128, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0100, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0151, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0147, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0139, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0104, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0081, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0092, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0101, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0114, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0100, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0080, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0097, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0084, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0084, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0019, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0084, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0018, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0092, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0093, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0158, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0137, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0161, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0108, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0082, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0111, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0111, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0086, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0091, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0124, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0141, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0141, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0094, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0141, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0114, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0126, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0093, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0144, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0099, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0117, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0092, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0106, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0113, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0115, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0174, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0082, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0084, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0134, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0132, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0108, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0103, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0108, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0099, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0102, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0223, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0091, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0093, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0129, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0201, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0119, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0172, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0091, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0118, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0124, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0144, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0082, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0154, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0138, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0118, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0091, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0104, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0089, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0106, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0148, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0174, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0137, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0104, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0180, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0090, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0110, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0119, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0092, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0098, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0092, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0080, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0090, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0123, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0145, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0121, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0108, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0085, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0128, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0082, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0100, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0097, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0097, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0159, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0125, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0081, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0208, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0250, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0145, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0115, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0100, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0086, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0083, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0097, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0109, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0089, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0024, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0081, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0102, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0021, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0084, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0125, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0089, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0131, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0139, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0120, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0097, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0111, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0110, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0094, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0102, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0121, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0189, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0085, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0131, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0091, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0102, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0086, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0083, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0083, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0081, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0104, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0125, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0180, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0103, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0181, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0157, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0130, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0098, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0081, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0084, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0096, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0093, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0086, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0115, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0127, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0088, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0088, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0100, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0119, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0112, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0106, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0150, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0113, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0120, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0112, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0144, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0131, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0132, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0137, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0116, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0099, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0136, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0086, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0106, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0080, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0106, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0099, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0136, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0116, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0124, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0089, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0108, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0089, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0161, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0109, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0137, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0100, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0123, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0115, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0100, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0134, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0115, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0093, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0098, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0103, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0205, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0113, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0083, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0103, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0137, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0098, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0136, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0105, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0108, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0136, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0095, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0121, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0110, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0221, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0097, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0116, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0120, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0103, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0154, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0127, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0094, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0117, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0121, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0185, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0132, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0084, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0105, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0092, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0093, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0114, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0081, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0084, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0083, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0102, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0081, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0111, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0119, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0101, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0099, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0083, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0109, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0104, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0140, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0084, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0088, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0119, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0101, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0125, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0127, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0123, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0082, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0100, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0128, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0135, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0135, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0140, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0101, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0190, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0142, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0083, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0084, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0089, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0098, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0090, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0112, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0156, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0092, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0087, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0097, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0189, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0176, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0128, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0090, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0101, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0115, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0167, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0115, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0113, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0095, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0086, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0089, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0085, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0110, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0136, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0208, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0124, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0146, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0095, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0085, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0118, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0096, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0147, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0110, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0091, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0086, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0110, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0135, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0080, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0094, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0100, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0098, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0107, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0094, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0134, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0185, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0128, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0146, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0088, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0130, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0125, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0113, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0125, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0095, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0233, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0104, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0086, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0119, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0105, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0093, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0083, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0100, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0097, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0101, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0142, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0107, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0105, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0094, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0115, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0096, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0120, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0117, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0122, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0024, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0080, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0158, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0088, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0090, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0159, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0089, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0112, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0096, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0096, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0108, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0084, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0082, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0095, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0099, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0114, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0098, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0134, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0108, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0091, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0084, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0099, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0097, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0117, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0089, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0123, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0088, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0105, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0142, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0130, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0137, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0120, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0172, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0101, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0108, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0134, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0116, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0096, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0103, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0086, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0153, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0141, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0087, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0122, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0081, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0139, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0144, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0081, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0081, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0094, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0114, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0104, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0130, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0186, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0099, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0146, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0089, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0118, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0085, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0108, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0127, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0080, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0088, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0085, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0157, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0087, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0133, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0112, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0202, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0115, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0104, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0127, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0082, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0135, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0185, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0131, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0175, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0116, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0105, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0097, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0129, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0088, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0142, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0135, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0118, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0144, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0131, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0109, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0146, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0121, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0160, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0142, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0083, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0089, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0103, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0105, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0090, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0116, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0172, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0181, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0116, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0088, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0097, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0081, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0101, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0100, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0117, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0148, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0176, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0129, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0111, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0081, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0096, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0132, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0085, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0084, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0153, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0081, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0106, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0094, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0127, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0101, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0084, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0121, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0081, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0145, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0118, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0221, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0108, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0121, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0125, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0105, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0024, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0084, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0091, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0126, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0082, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0101, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0081, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0160, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0109, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0144, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0098, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0083, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0086, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0081, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0117, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0090, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0110, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0110, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0090, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0096, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0089, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0121, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0142, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0103, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0122, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0132, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0105, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0087, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0092, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0143, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0135, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0186, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0090, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0098, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0145, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0139, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0095, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0092, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0100, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0090, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0118, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0114, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0109, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0140, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0103, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0119, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0128, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0096, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0105, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0088, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0167, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0116, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0092, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0098, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0131, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0114, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0093, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0094, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0085, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0092, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0104, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0082, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0093, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0162, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0122, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0178, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0200, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0134, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0091, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0092, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0131, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0103, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0121, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0165, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0156, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0128, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0110, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0085, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0146, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0106, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0111, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0096, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0097, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0139, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0117, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0131, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0129, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0106, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0130, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0097, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0080, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0096, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0155, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0105, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0102, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0139, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0085, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0228, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0155, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0081, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0086, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0112, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0094, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0113, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0134, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0119, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0103, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0081, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0124, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0127, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0145, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0126, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0105, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0114, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0089, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0111, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0121, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0111, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0130, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0092, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0080, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0091, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0083, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0086, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0088, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0111, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0138, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0162, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0085, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0129, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0117, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0096, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0099, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0105, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0102, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0179, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0120, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0126, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0123, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0087, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0103, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0147, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0109, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0099, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0127, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0114, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0097, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0087, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0102, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0114, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0086, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0083, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0091, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0091, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0081, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0193, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0019, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0092, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0088, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0203, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0082, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0126, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0165, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0124, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0085, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0109, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0156, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0080, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0085, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0102, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0148, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0110, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0112, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0113, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0149, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0080, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0087, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0098, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0088, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0087, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0090, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0080, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0139, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0126, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0137, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0139, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0101, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0108, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0155, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0194, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0109, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0130, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0160, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0088, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0109, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0152, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0116, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0087, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0089, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0181, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0087, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0172, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0149, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0145, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0083, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0096, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0169, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0098, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0176, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0139, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0110, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0116, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0150, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0081, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0082, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0080, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0111, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0124, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0214, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0150, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0144, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0092, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0092, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0127, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0083, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0080, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0136, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0108, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0086, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0125, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0103, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0218, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0129, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0087, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0121, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0160, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0111, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0185, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0112, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0111, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0082, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0154, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0096, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0157, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0103, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0084, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0081, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0110, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0138, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0105, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0197, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0083, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0080, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0118, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0092, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0107, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0089, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0178, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0128, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0139, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0110, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0133, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0110, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0087, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0089, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0088, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0153, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0088, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0130, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0159, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0093, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0085, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0111, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0093, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0085, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0188, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0135, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0116, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0161, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0145, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0123, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0085, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0080, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0083, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0175, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0104, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0103, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0114, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0167, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0169, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0114, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0112, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0080, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0110, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0189, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0105, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0106, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0093, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0080, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0123, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0154, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0120, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0114, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0108, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0109, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0121, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0122, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0163, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0121, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0153, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0087, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0162, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0118, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0093, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0097, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0091, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0176, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0092, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0117, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0083, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0093, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0202, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0129, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0118, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0092, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0126, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0182, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0124, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0147, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0080, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0132, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0021, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0179, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0115, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0091, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0096, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0098, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0091, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0082, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0135, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0121, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0141, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0108, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0090, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0130, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0103, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0117, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0129, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0095, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0101, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0117, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0081, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0081, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0124, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0160, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0141, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0089, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0112, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0111, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0116, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0088, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0138, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0090, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0087, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0202, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0096, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0122, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0091, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0088, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0129, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0112, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0177, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0115, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0096, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0095, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0109, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0171, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0136, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0094, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0142, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0121, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0104, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0080, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0087, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0115, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0093, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0093, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0107, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0088, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0094, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0176, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0114, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0109, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0135, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0081, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0102, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0083, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0082, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0081, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0166, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0091, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0083, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0096, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0090, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0123, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0190, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0098, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0116, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0135, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0144, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0118, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0119, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0117, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0103, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0094, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0109, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0110, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0135, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0103, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0085, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0098, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0126, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0109, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0097, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0084, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0121, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0024, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0095, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0107, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0115, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0083, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0258, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0174, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0139, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0095, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0090, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0115, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0100, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0110, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0133, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0146, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0120, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0097, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0117, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0096, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0120, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0086, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0102, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0154, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0113, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0214, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0089, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0094, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0105, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0111, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0096, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0140, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0110, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0147, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0098, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0132, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0116, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0086, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0095, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0118, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0120, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0123, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0140, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0157, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0128, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0082, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0134, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0089, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0164, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0088, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0111, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0169, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0144, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0083, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0083, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0115, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0102, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0108, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0127, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0102, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0115, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0114, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0086, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0123, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0081, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0103, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0103, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0141, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0080, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0131, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0111, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0080, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0103, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0114, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0146, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0098, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0180, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0091, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0111, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0099, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0115, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0092, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0158, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0080, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0115, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0145, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0125, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0127, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0123, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0101, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0162, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0093, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0125, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0108, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0113, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0149, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0154, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0117, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0095, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0099, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0119, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0091, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0179, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0152, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0094, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0090, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0121, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0087, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0168, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0103, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0090, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0129, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0104, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0102, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0138, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0150, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0093, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0156, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0228, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0087, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0088, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0083, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0084, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0087, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0122, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0150, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0160, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0094, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0117, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0085, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0112, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0097, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0121, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0130, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0088, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0088, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0092, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0087, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0083, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0105, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0148, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0215, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0090, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0142, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0142, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0162, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0089, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0128, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0103, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0123, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0089, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0113, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0114, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0164, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0161, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0145, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0167, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0117, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0123, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0089, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0135, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0118, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0126, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0117, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0087, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0140, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0096, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0202, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0105, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0142, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0090, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0098, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0118, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0131, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0085, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0228, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0083, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0084, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0099, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0087, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0089, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0125, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0243, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0155, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0090, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0133, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0088, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0133, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0149, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0124, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0089, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0085, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0138, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0088, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0088, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0153, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0094, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0190, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0096, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0142, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0175, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0167, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0137, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0105, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0133, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0093, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0140, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0106, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0089, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0104, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0099, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0101, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0136, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0117, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0103, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0097, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0101, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0135, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0092, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0115, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0172, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0152, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0114, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0118, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0084, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0085, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0092, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0203, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0100, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0166, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0093, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0082, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0090, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0089, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0132, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0093, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0103, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0088, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0155, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0084, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0092, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0128, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0082, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0113, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0084, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0121, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0102, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0135, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0092, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0131, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0090, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0135, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0083, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0105, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0106, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0110, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0116, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0098, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0092, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0091, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0112, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0141, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0089, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0095, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0098, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0111, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0080, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0096, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0104, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0191, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0119, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0116, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0149, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0146, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0106, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0102, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0103, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0126, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0098, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0114, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0093, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0118, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0115, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0101, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0111, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0128, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0093, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0164, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0178, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0096, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0125, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0102, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0118, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0080, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0108, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0096, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0100, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0186, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0093, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0197, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0082, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0098, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0103, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0101, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0102, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0105, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0150, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0109, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0108, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0121, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0141, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0086, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0086, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0086, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0105, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0138, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0087, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0084, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0113, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0081, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0023, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0128, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0148, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0156, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0155, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0118, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0121, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0101, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0081, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0154, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0112, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0164, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0091, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0121, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0084, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0084, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0093, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0106, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0158, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0176, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0151, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0135, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0119, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0022, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0084, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0137, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0089, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0114, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0098, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0086, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0131, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0122, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0149, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0108, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0126, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0085, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0144, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0135, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0095, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0091, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0084, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0094, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0084, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0100, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0112, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0088, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0086, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0080, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0103, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0088, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0130, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0107, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0152, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0135, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0111, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0168, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0087, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0085, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0099, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0091, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0096, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0132, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0131, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0124, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0084, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0147, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0083, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0094, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0130, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0121, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0187, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0137, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0094, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0126, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0116, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0141, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0090, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0107, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0162, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0113, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0146, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0022, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0100, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0147, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0085, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0098, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0115, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0096, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0096, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0122, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0183, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0141, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0135, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0125, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0109, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0097, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0112, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0121, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0101, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0119, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0086, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0191, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0091, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0117, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0110, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0154, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0123, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0086, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0176, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0091, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0123, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0153, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0111, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0147, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0122, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0092, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0151, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0114, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0096, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0080, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0114, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0116, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0100, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0085, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0139, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0151, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0106, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0104, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0146, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0104, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0105, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0120, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0148, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0093, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0109, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0135, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0107, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0094, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0153, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0126, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0106, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0125, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0152, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0112, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0092, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0170, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0144, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0117, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0087, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0130, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0088, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0116, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0166, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0093, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0106, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0115, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0175, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0116, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0111, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0089, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0085, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0084, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0089, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0095, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0161, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0109, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0095, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0105, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0099, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0084, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0136, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0146, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0117, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0146, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0129, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0133, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0136, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0084, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0132, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0123, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0128, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0116, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0125, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0130, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0115, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0082, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0089, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0109, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0088, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0135, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0122, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0152, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0146, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0130, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0124, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0088, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0123, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0099, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0116, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0103, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0100, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0175, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0109, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0137, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0087, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0135, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0083, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0095, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0142, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0128, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0129, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0086, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0118, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0154, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0105, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0121, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0116, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0131, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0085, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0117, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0092, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0122, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0147, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0180, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0122, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0103, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0212, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0202, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0092, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0081, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0180, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0147, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0130, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0138, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0115, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0127, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0123, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0159, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0095, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0148, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0206, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0205, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0154, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0134, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0127, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0126, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0110, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0097, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0081, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0104, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0119, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0134, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0127, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0116, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0096, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0102, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0093, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0151, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0094, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0131, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0131, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0082, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0106, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0103, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0148, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0125, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0106, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0118, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0091, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0105, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0132, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0110, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0114, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0185, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0150, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0122, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0084, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0096, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0150, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0080, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0121, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0109, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0087, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0094, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0091, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0085, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0107, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0179, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0114, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0093, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0089, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0116, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0133, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0101, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0098, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0148, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0179, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0139, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0153, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0113, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0117, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0117, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0089, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0118, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0094, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0152, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0096, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0108, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0093, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0088, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0087, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0133, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0082, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0149, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0140, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0140, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0147, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0114, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0129, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0083, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0080, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0122, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0100, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0099, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0102, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0091, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0085, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0100, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0130, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0227, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0089, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0082, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0090, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0136, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0087, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0095, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0103, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0080, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0151, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0109, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0167, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0097, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0093, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0139, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0105, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0152, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0092, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0116, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0098, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0105, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0083, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0106, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0085, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0095, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0147, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0142, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0099, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0081, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0119, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0088, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0094, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0105, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0118, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0085, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0108, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0081, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0121, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0145, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0129, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0119, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0154, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0080, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0086, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0082, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0085, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0122, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0016, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0084, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0095, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0090, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0175, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0091, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0143, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0080, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0128, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0114, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0081, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0082, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0103, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0096, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0109, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0083, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0096, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0146, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0099, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0138, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0133, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0104, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0098, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0101, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0119, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0134, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0153, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0085, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0106, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0096, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0100, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0097, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0087, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0152, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0110, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0116, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0128, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0099, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0109, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0094, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0023, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0081, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0090, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0081, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0129, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0108, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0101, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0081, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0099, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0149, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0157, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0089, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0141, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0100, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0089, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0106, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0114, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0108, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0104, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0080, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0114, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0116, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0095, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0178, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0085, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0083, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0080, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0081, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0105, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0082, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0127, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0088, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0086, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0088, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0080, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0088, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0146, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0099, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0106, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0093, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0164, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0021, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0113, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0127, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0081, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0144, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0121, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0021, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0085, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0086, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0096, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0081, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0136, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0100, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0090, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0118, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0103, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0103, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0104, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0092, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0147, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0085, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0125, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0080, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0090, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0112, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0086, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0089, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0105, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0116, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0106, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0081, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0110, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0085, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0092, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0081, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0105, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0114, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0095, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0116, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0014, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0113, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0102, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0140, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0103, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0098, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0106, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0084, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0096, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0107, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0085, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0087, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0084, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0160, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0083, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0084, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0105, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0154, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0085, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0086, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0125, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0125, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0134, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0130, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0095, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0100, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0113, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0115, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0115, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0132, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0098, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0090, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0111, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0132, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0138, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0094, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0089, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0083, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0085, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0089, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0090, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0106, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0087, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0088, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0109, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0020, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0089, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0081, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0105, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0107, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0151, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0091, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0086, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0089, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0100, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0130, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0082, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0143, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0085, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0102, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0092, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0106, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0096, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0102, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0092, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0096, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0088, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0108, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0081, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0086, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0089, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0083, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0099, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0162, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0161, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0126, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0154, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0104, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0122, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0097, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0089, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0100, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0094, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0094, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0093, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0098, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0115, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0140, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0022, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0092, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0092, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0089, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0134, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0091, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0084, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0119, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0082, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0120, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0084, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0121, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0091, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0024, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0099, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0178, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0085, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0089, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0148, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0087, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0102, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0086, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0113, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0110, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0130, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0093, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0110, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0134, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0118, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0083, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0112, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0083, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0091, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0100, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0109, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0112, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0089, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0126, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0086, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0085, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0160, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0090, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0083, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0111, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0106, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0096, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0127, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0104, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0100, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0110, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0090, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0091, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0094, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0089, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0103, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0096, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0084, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0208, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0113, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0092, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0081, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0124, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0097, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0113, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0096, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0101, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0096, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0108, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0083, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0088, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0093, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0082, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0021, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0111, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0097, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0099, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0100, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0090, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0102, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0093, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0023, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0170, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0023, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0143, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0105, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0083, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0134, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0160, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0103, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0098, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0080, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0126, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0090, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0085, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0106, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0127, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0086, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0116, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0109, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0106, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0104, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0080, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0105, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0119, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0150, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0083, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0016, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0098, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0114, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0085, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0086, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0082, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0121, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0140, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0130, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0119, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0119, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0084, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0081, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0114, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0087, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0083, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0108, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0085, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0086, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0088, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0158, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0181, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0082, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0106, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0080, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0096, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0111, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0110, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0086, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0086, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0106, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0096, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0137, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0121, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0148, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0096, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0099, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0089, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0113, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0083, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0128, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0095, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0103, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0115, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0080, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0088, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0196, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0130, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0088, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0125, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0103, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0102, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0106, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0086, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0086, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0093, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0085, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0092, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0125, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0107, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0098, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0114, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0088, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0137, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0086, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0112, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0103, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0083, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0085, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0134, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0147, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0093, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0086, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0119, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0097, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0093, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0139, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0096, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0081, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0132, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0097, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0113, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0082, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0083, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0080, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0115, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0140, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0096, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0121, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0093, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0103, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0104, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0098, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0102, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0093, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0139, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0103, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0104, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0091, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0151, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0152, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0138, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0081, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0120, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0122, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0112, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0102, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0148, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0096, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0122, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0126, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0102, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0153, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0113, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0154, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0133, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0140, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0017, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0114, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0087, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0140, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0100, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0099, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0131, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0105, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0125, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0107, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0023, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0101, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0103, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0017, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0208, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0140, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0084, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0086, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0098, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0087, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0113, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0084, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0091, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0086, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0105, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0111, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0093, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0097, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0093, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0106, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0093, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0081, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0084, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0089, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0087, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0086, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0157, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0148, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0087, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0128, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0103, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0113, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0153, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0112, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0118, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0083, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0117, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0086, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0086, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0091, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0084, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0117, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0132, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0093, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0124, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0085, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0141, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0089, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0122, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0093, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0150, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0102, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0105, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0142, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0081, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0080, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0128, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0102, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0024, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0127, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0086, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0098, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0088, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0158, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0085, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0107, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0129, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0102, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0103, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0084, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0129, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0085, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0086, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0113, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0080, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0085, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0100, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0145, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0099, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0081, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0081, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0093, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0102, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0097, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0103, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0106, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0089, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0094, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0101, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0143, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0114, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0115, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0091, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0085, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0107, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0097, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0087, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0112, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0101, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0110, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0096, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0080, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0147, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0087, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0107, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0113, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0099, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0131, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0196, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0100, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0102, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0117, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0087, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0110, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0116, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0083, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0093, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0098, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0090, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0099, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0087, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0096, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0138, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0117, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0088, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0095, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0092, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0087, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0023, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0114, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0099, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0104, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0172, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0090, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0122, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0119, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0091, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0085, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0091, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0156, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0120, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0094, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0085, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0083, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0094, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0082, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0156, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0095, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0114, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0095, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0080, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0089, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0098, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0136, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0081, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0091, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0137, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0097, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0095, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0082, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0081, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0137, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0083, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0094, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0114, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0084, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0084, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0089, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0141, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0115, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0105, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0102, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0165, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0188, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0096, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0092, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0105, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0085, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0097, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0103, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0089, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0088, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0132, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0121, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0130, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0081, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0080, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0103, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0083, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0104, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0108, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0083, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0082, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0099, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0087, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0081, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0116, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0102, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0083, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0092, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0128, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0108, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0098, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0088, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0099, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0125, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0087, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0119, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0088, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0127, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0110, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0093, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0090, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0095, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0119, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0088, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0082, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0082, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0118, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0148, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0117, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0100, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0117, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0139, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0108, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0092, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0083, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0081, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
--------------EPOCH 4-------------
Test Accuracy: tensor(0.9500, device='cuda:0')
Loss: tensor(0.0156, device='cuda:0', grad_fn=<NllLossBackward>)
Saving model at iteration: 0
Loss: tensor(0.0115, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0104, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0139, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0139, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0258, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0108, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0207, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0229, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0104, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0147, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0107, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0154, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0121, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0182, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0137, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0160, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0174, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0179, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0167, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0221, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0125, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0212, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0127, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0135, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0226, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0191, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0142, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0092, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0141, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0155, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0105, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0167, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0236, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0160, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0233, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0096, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0136, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0210, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0152, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0220, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0150, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0225, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0114, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0102, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0253, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0161, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0227, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0114, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0225, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0087, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0165, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0122, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0082, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0247, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0180, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0128, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0154, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0164, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0114, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0146, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0089, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0196, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0123, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0081, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0017, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0150, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0017, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0081, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0016, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0140, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0263, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0146, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0123, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0182, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0152, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0014, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0191, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0127, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0178, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0013, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0100, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0174, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0111, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0183, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0136, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0152, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0093, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0087, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0011, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0011, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0154, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0092, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0087, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0114, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0135, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0113, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0128, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0088, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0080, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0086, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0009, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0020, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0009, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0106, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0113, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0106, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0009, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0141, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0099, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0166, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0092, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0099, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0098, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0189, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0149, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0087, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0118, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0134, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0008, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0008, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0122, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0103, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0189, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0233, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0024, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0007, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0007, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0007, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0007, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0097, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0189, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0189, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0098, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0084, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0155, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0105, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0007, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0131, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0092, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0021, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0099, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0007, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0234, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0007, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0082, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0007, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0021, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0139, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0021, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0150, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0185, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0225, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0083, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0113, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0131, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0241, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0230, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0192, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0086, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0117, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0177, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0127, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0099, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0087, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0085, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0006, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0270, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0145, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0138, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0151, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0006, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0174, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0219, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0232, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0107, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0123, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0123, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0107, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0203, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0133, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0157, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0086, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0168, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0195, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0115, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0186, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0108, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0200, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0126, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0140, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0135, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0152, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0190, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0100, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0157, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0098, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0098, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0115, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0091, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0158, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0133, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0088, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0176, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0122, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0145, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0161, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0242, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0096, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0214, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0129, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0098, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0122, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0083, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0149, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0214, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0118, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0111, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0101, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0206, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0148, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0181, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0181, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0133, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0140, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0231, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0128, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0174, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0152, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0134, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0106, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0111, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0151, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0107, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0220, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0170, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0153, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0082, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0129, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0114, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0088, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0183, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0137, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0159, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0147, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0126, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0118, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0109, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0222, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0148, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0143, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0212, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0103, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0191, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0260, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0122, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0136, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0199, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0194, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0186, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0124, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0113, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0123, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0121, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0102, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0177, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0093, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0131, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0089, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0223, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0145, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0155, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0160, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0188, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0190, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0127, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0082, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0094, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0120, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0112, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0106, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0145, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0146, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0081, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0203, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0103, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0113, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0222, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0126, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0129, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0182, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0133, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0094, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0123, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0121, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0123, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0092, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0176, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0194, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0152, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0113, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0101, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0209, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0180, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0091, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0175, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0129, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0250, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0100, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0210, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0162, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0130, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0111, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0120, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0143, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0224, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0083, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0140, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0146, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0190, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0111, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0098, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0133, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0123, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0147, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0227, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0179, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0096, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0163, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0248, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0122, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0148, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0228, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0232, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0110, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0104, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0128, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0080, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0108, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0119, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0118, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0146, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0144, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0090, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0012, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0091, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0125, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0107, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0105, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0099, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0099, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0182, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0098, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0117, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0119, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0096, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0178, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0151, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0146, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0103, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0088, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0105, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0083, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0102, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0167, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0150, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0093, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0112, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0094, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0163, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0139, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0118, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0155, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0165, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0118, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0143, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0114, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0156, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0105, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0097, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0113, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0085, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0125, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0127, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0160, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0091, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0083, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0154, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0113, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0098, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0215, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0290, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0163, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0101, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0148, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0159, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0009, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0119, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0024, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0094, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0086, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0125, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0154, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0101, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0130, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0184, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0162, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0206, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0022, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0082, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0140, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0162, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0128, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0080, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0167, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0023, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0168, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0207, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0135, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0150, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0194, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0116, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0159, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0125, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0116, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0121, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0153, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0093, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0080, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0008, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0112, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0169, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0152, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0176, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0129, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0112, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0169, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0132, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0153, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0010, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0117, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0136, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0128, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0092, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0100, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0115, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0123, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0115, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0208, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0144, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0140, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0016, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0098, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0149, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0122, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0089, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0086, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0082, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0148, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0197, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0133, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0168, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0091, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0159, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0130, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0084, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0135, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0116, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0093, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0154, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0183, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0201, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0023, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0165, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0137, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0087, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0137, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0143, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0296, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0087, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0086, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0011, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0139, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0089, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0090, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0098, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0122, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0090, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0097, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0122, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0148, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0090, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0133, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0145, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0096, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0107, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0151, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0127, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0080, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0018, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0171, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0093, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0165, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0164, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0163, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0129, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0080, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0139, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0097, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0180, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0107, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0181, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0122, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0139, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0175, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0123, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0084, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0017, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0229, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0230, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0173, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0108, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0090, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0115, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0125, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0131, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0132, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0096, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0244, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0169, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0127, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0143, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0123, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0083, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0180, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0146, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0119, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0169, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0155, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0107, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0230, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0124, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0132, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0156, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0093, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0092, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0231, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0126, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0132, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0102, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0086, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0114, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0094, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0102, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0203, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0089, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0254, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0094, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0170, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0086, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0130, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0193, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0090, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0016, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0086, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0082, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0097, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0155, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0100, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0164, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0090, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0134, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0100, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0096, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0206, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0213, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0081, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0253, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0127, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0129, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0092, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0149, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0110, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0022, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0120, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0015, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0181, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0129, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0141, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0146, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0111, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0020, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0163, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0307, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0102, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0106, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0086, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0110, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0187, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0081, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0096, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0124, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0186, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0154, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0123, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0144, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0013, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0130, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0083, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0183, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0173, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0139, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0007, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0094, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0133, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0133, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0116, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0217, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0119, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0095, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0120, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0164, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0122, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0156, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0087, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0162, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0166, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0087, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0105, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0143, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0149, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0131, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0096, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0123, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0118, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0118, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0160, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0215, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0092, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0149, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0135, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0141, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0095, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0101, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0103, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0176, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0106, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0155, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0143, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0149, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0177, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0149, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0156, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0085, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0236, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0181, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0108, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0147, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0157, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0094, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0110, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0118, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0094, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0128, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0140, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0099, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0235, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0102, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0159, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0102, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0107, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0112, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0008, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0223, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0135, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0144, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0143, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0223, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0188, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0173, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0179, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0022, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0138, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0200, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0202, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0151, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0133, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0082, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0144, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0092, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0099, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0209, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0102, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0165, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0110, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0112, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0116, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0193, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0175, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0175, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0144, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0140, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0157, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0165, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0175, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0100, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0105, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0201, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0121, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0112, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0145, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0093, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0241, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0155, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0141, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0086, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0091, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0115, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0104, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0136, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0159, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0169, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0102, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0124, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0158, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0134, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0088, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0131, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0107, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0015, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0146, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0086, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0125, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0209, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0148, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0169, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0110, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0105, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0087, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0135, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0107, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0124, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0136, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0090, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0223, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0136, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0095, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0095, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0185, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0090, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0219, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0101, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0085, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0102, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0091, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0097, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0220, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0143, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0113, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0092, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0083, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0197, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0152, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0108, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0211, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0140, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0117, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0154, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0159, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0098, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0112, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0100, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0102, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0098, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0095, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0175, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0097, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0141, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0148, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0183, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0176, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0126, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0190, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0105, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0176, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0162, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0098, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0165, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0100, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0244, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0203, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0165, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0102, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0016, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0120, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0113, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0155, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0163, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0172, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0141, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0105, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0207, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0234, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0011, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0162, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0102, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0111, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0096, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0241, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0153, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0014, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0021, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0107, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0083, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0088, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0126, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0134, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0133, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0105, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0130, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0095, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0142, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0085, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0192, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0130, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0209, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0123, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0089, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0190, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0125, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0091, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0015, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0082, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0017, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0185, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0092, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0122, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0133, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0168, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0134, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0023, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0097, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0185, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0149, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0109, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0107, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0163, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0020, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0133, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0113, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0100, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0144, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0198, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0144, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0094, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0134, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0088, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0190, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0176, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0151, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0127, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0228, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0158, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0095, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0093, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0080, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0094, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0142, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0124, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0012, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0212, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0147, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0087, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0083, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0410, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0091, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0152, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0128, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0146, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0018, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0132, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0105, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0004, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0133, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0133, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0101, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0162, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0160, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0171, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0127, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0098, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0155, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0089, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0122, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0007, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0176, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0098, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0135, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0130, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0013, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0109, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0109, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0106, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0086, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0107, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0214, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0117, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0096, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0160, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0121, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0015, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0161, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0110, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0129, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0099, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0173, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0098, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0096, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0133, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0224, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0188, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0181, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0211, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0132, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0191, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0184, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0121, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0189, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0102, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0136, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0196, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0100, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0118, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0166, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0118, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0184, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0140, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0108, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0181, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0268, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0120, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0142, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0135, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0161, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0134, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0178, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0086, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0218, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0142, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0158, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0217, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0148, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0171, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0083, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0148, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0182, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0095, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0123, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0212, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0092, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0133, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0014, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0132, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0114, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0165, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0095, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0219, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0114, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0016, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0128, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0116, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0237, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0088, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0023, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0103, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0083, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0114, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0142, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0101, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0175, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0186, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0141, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0147, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0015, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0014, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0081, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0131, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0107, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0090, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0095, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0142, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0113, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0086, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0103, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0105, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0086, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0142, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0146, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0094, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0083, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0181, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0128, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0265, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0112, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0096, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0117, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0200, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0092, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0140, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0106, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0084, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0154, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0115, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0154, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0194, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0195, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0169, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0191, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0130, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0022, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0112, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0110, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0182, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0099, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0121, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0014, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0170, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0155, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0196, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0015, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0022, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0100, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0148, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0095, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0157, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0131, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0142, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0202, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0118, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0206, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0106, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0099, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0179, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0004, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0112, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0161, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0091, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0279, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0083, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0156, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0172, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0136, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0083, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0120, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0134, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0111, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0099, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0110, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0091, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0194, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0221, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0140, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0081, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0154, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0098, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0130, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0018, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0096, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0080, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0084, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0101, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0085, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0155, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0119, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0170, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0184, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0120, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0087, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0115, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0102, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0087, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0160, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0015, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0004, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0145, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0176, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0135, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0171, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0116, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0120, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0114, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0158, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0095, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0091, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0117, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0136, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0128, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0102, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0018, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0120, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0101, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0082, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0197, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0174, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0103, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0131, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0133, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0150, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0152, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0006, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0176, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0115, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0211, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0138, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0124, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0170, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0110, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0111, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0176, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0223, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0111, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0259, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0107, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0100, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0177, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0127, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0096, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0128, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0107, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0116, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0099, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0087, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0145, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0014, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0145, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0098, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0011, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0110, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0124, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0114, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0084, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0098, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0119, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0158, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0097, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0092, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0134, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0146, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0117, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0142, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0096, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0133, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0087, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0152, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0085, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0094, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0014, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0171, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0157, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0165, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0081, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0134, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0183, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0128, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0144, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0171, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0088, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0174, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0112, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0139, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0138, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0022, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0116, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0092, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0081, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0092, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0181, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0087, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0135, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0098, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0091, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0144, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0091, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0266, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0140, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0247, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0231, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0095, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0109, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0136, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0158, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0106, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0101, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0100, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0134, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0127, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0134, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0104, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0112, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0110, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0107, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0015, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0101, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0003, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0116, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0211, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0100, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0093, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0197, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0114, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0183, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0167, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0094, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0091, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0163, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0110, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0080, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0108, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0143, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0168, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0122, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0004, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0088, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0157, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0003, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0121, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0187, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0090, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0135, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0114, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0087, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0115, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0194, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0162, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0100, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0119, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0195, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0009, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0162, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0114, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0142, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0015, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0095, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0144, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0134, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0181, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0136, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0121, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0177, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0139, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0015, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0162, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0094, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0131, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0095, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0082, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0145, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0155, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0127, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0099, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0131, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0153, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0113, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0223, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0131, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0110, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0230, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0108, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0107, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0110, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0152, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0126, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0173, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0136, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0122, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0148, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0100, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0090, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0128, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0128, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0298, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0087, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0202, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0087, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0021, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0145, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0156, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0087, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0144, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0142, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0099, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0118, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0093, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0145, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0080, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0165, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0153, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0116, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0138, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0101, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0159, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0116, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0188, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0168, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0146, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0151, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0128, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0096, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0106, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0122, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0159, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0082, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0222, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0116, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0092, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0013, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0080, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0094, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0174, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0127, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0081, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0176, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0222, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0134, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0276, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0114, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0109, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0009, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0211, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0116, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0080, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0098, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0101, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0142, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0118, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0147, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0124, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0133, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0114, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0169, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0141, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0240, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0105, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0021, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0140, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0105, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0102, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0082, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0136, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0186, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0084, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0180, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0274, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0169, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0086, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0206, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0283, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0174, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0113, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0129, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0195, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0086, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0092, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0185, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0090, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0119, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0175, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0123, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0103, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0103, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0127, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0133, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0083, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0083, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0126, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0117, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0095, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0091, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0015, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0082, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0201, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0085, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0149, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0102, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0203, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0218, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0122, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0133, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0131, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0134, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0108, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0091, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0118, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0115, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0117, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0185, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0136, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0180, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0138, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0093, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0157, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0116, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0162, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0204, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0113, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0134, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0172, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0183, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0124, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0161, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0021, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0099, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0222, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0081, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0164, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0119, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0115, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0156, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0137, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0102, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0139, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0142, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0111, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0115, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0117, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0109, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0119, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0149, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0102, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0114, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0022, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0095, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0098, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0101, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0008, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0107, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0118, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0094, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0110, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0099, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0090, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0109, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0115, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0108, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0127, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0089, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0106, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0135, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0131, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0145, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0136, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0097, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0017, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0100, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0184, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0128, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0144, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0172, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0083, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0093, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0119, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0120, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0156, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0136, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0097, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0089, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0146, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0182, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0156, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0138, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0137, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0021, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0141, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0124, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0148, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0180, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0089, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0147, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0133, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0195, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0110, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0219, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0114, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0088, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0147, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0145, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0134, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0081, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0162, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0110, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0117, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0126, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0122, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0107, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0118, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0146, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0276, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0120, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0124, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0108, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0245, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0120, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0023, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0155, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0141, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0143, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0086, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0136, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0145, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0113, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0183, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0199, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0119, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0089, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0103, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0140, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0016, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0134, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0099, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0083, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0110, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0228, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0122, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0127, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0189, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0107, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0132, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0091, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0150, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0082, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0107, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0181, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0017, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0177, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0006, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0112, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0107, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0200, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0300, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0089, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0105, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0117, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0153, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0121, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0136, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0009, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0187, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0088, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0089, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0016, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0185, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0094, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0124, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0213, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0109, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0148, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0088, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0130, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0090, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0118, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0233, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0245, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0152, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0139, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0093, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0098, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0083, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0259, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0169, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0209, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0113, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0182, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0125, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0144, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0086, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0282, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0104, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0110, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0111, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0195, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0089, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0095, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0126, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0119, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0148, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0113, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0145, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0150, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0134, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0225, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0118, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0150, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0126, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0101, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0161, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0082, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0087, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0187, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0134, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0164, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0080, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0122, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0170, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0085, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0108, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0125, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0020, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0204, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0153, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0101, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0109, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0109, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0123, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0094, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0098, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0106, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0089, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0091, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0105, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0106, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0103, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0148, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0149, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0014, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0139, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0179, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0102, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0116, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0159, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0110, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0008, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0099, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0130, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0109, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0120, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0117, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0161, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0115, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0085, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0111, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0110, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0087, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0125, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0154, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0115, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0146, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0141, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0124, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0150, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0137, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0024, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0132, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0083, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0164, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0006, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0226, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0012, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0185, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0128, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0086, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0204, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0008, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0121, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0152, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0144, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0018, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0153, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0182, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0106, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0234, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0141, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0085, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0096, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0101, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0083, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0160, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0125, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0146, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0024, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0107, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0115, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0083, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0210, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0230, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0139, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0120, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0094, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0123, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0134, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0015, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0183, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0183, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0151, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0151, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0163, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0138, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0090, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0096, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0187, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0113, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0082, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0081, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0156, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0088, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0122, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0099, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0128, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0118, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0158, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0114, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0142, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0095, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0124, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0175, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0189, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0107, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0099, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0162, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0150, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0084, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0155, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0130, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0081, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0169, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0165, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0086, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0097, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0175, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0162, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0169, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0084, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0113, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0081, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0094, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0102, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0103, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0227, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0083, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0118, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0014, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0095, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0122, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0103, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0083, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0136, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0227, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0130, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0141, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0100, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0094, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0168, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0101, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0104, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0097, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0012, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0168, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0234, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0221, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0021, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0147, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0139, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0135, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0161, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0120, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0107, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0111, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0188, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0134, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0087, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0193, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0154, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0104, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0103, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0190, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0163, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0175, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0100, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0115, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0093, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0080, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0091, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0102, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0104, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0088, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0086, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0110, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0112, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0105, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0142, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0086, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0208, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0138, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0083, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0149, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0182, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0206, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0175, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0100, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0155, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0093, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0171, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0141, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0139, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0141, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0130, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0340, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0103, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0024, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0139, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0089, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0113, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0087, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0099, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0113, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0114, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0086, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0165, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0091, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0170, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0154, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0107, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0012, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0091, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0088, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0096, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0099, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0145, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0157, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0206, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0123, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0109, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0226, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0181, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0099, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0178, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0111, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0187, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0090, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0098, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0124, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0113, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0098, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0097, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0161, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0005, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0112, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0096, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0085, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0120, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0127, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0105, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0104, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0108, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0134, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0105, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0087, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0023, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0086, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0102, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0109, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0109, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0141, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0101, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0118, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0013, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0178, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0007, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0131, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0173, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0158, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0146, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0115, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0219, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0128, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0127, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0082, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0096, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0156, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0103, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0107, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0120, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0092, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0101, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0094, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0008, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0011, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0151, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0285, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0166, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0147, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0183, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0109, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0174, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0166, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0024, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0116, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0134, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0105, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0105, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0087, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0093, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0157, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0132, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0152, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0141, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0132, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0151, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0083, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0248, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0010, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0106, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0163, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0132, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0146, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0162, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0118, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0182, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0116, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0113, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0095, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0016, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0130, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0125, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0149, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0082, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0109, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0116, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0103, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0246, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0112, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0177, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0120, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0091, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0086, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0147, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0164, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0124, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0083, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0128, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0091, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0095, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0083, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0125, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0109, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0087, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0111, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0204, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0090, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0110, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0020, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0082, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0108, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0134, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0185, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0155, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0114, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0089, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0163, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0182, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0135, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0104, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0145, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0109, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0148, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0157, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0131, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0015, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0115, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0016, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0102, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0156, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0140, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0112, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0111, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0200, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0080, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0137, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0228, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0092, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0254, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0111, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0103, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0162, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0087, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0009, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0126, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0086, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0215, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0086, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0088, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0100, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0291, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0116, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0097, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0081, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0104, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0193, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0083, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0106, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0090, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0126, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0080, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0161, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0191, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0088, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0083, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0167, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0132, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0121, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0123, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0173, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0135, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0193, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0091, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0085, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0109, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0082, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0123, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0147, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0083, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0096, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0133, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0012, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0198, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0202, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0118, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0207, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0110, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0112, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0116, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0008, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0092, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0117, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0114, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0264, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0113, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0156, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0110, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0141, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0118, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0101, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0021, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0098, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0007, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0137, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0198, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0203, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0083, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0116, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0138, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0099, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0128, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0235, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0085, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0213, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0140, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0119, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0215, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0143, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0127, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0092, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0095, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0119, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0131, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0080, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0160, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0082, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0118, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0095, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0132, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0169, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0090, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0098, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0105, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0110, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0179, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0102, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0124, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0112, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0252, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0109, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0274, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0101, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0236, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0088, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0081, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0105, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0081, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0113, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0020, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0156, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0203, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0110, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0151, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0154, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0097, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0086, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0114, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0122, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0135, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0137, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0104, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0137, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0108, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0165, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0100, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0013, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0016, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0152, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0126, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0106, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0111, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0095, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0082, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0122, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0209, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0188, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0183, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0208, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0109, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0129, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0186, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0018, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0162, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0105, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0146, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0106, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0211, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0113, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0121, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0158, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0083, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0098, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0092, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0162, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0151, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0199, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0144, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0137, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0151, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0094, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0115, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0141, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0082, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0101, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0175, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0126, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0097, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0161, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0134, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0081, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0082, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0180, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0115, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0123, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0121, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0130, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0120, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0082, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0151, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0102, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0142, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0134, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0131, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0137, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0085, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0192, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0101, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0115, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0188, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0109, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0182, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0005, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0149, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0117, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0099, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0081, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0129, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0129, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0083, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0109, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0123, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0098, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0118, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0106, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0102, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0091, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0083, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0163, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0163, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0244, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0171, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0200, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0123, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0118, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0097, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0166, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0124, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0194, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0189, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0085, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0108, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0136, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0154, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0086, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0161, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0271, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0213, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0086, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0204, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0109, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0123, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0122, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0083, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0150, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0097, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0118, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0128, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0115, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0118, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0158, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0122, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0106, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0152, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0224, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0011, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0118, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0090, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0121, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0128, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0141, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0133, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0255, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0134, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0133, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0087, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0134, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0085, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0176, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0103, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0129, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0140, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0123, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0193, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0115, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0091, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0097, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0221, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0094, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0136, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0207, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0187, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0178, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0167, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0094, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0222, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0081, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0189, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0157, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0324, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0157, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0199, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0175, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0082, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0129, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0133, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0149, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0305, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0120, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0125, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0189, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0100, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0100, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0117, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0153, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0097, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0160, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0177, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0181, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0099, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0235, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0005, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0160, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0097, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0080, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0148, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0118, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0097, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0188, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0110, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0257, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0108, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0154, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0119, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0135, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0134, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0191, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0095, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0149, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0098, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0175, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0154, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0120, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0093, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0204, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0115, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0118, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0023, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0087, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0155, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0099, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0212, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0140, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0139, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0237, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0234, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0160, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0100, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0110, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0217, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0140, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0175, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0104, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0107, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0085, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0143, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0143, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0199, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0148, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0087, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0129, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0086, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0119, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0151, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0080, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0108, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0125, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0102, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0123, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0124, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0173, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0143, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0160, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0085, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0023, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0091, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0123, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0112, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0109, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0094, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0018, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0092, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0138, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0082, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0162, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0115, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0191, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0142, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0117, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0083, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0122, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0116, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0087, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0101, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0152, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0119, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0157, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0109, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0120, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0153, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0091, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0129, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0168, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0093, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0166, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0165, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0165, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0106, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0139, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0117, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0092, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0087, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0091, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0085, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0119, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0173, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0147, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0095, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0137, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0104, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0110, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0013, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0121, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0204, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0084, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0114, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0139, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0145, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0153, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0104, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0221, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0091, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0020, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0119, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0109, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0016, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0019, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0101, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0093, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0192, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0131, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0014, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0136, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0117, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0212, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0020, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0021, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0095, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0124, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0080, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0121, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0147, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0240, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0094, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0160, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0222, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0105, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0184, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0200, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0114, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0108, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0161, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0116, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0130, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0175, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0161, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0102, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0196, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0009, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0151, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0092, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0123, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0153, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0131, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0093, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0087, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0107, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0245, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0088, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0099, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0158, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0100, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0091, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0182, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0173, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0115, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0139, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0135, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0081, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0146, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0134, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0177, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0083, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0121, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0194, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0084, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0138, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0119, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0228, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0101, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0149, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0121, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0022, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0118, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0104, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0114, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0088, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0154, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0124, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0123, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0107, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0090, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0116, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0115, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0127, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0192, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0194, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0102, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0153, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0186, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0124, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0114, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0116, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0107, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0081, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0095, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0018, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0090, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0130, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0161, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0021, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0140, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0160, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0129, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0145, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0156, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0160, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0011, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0096, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0155, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0112, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0198, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0163, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0192, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0103, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0146, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0140, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0165, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0108, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0006, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0113, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0177, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0193, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0089, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0105, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0257, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0175, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0095, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0106, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0009, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0258, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0089, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0122, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0267, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0149, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0088, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0117, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0126, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0196, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0134, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0163, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0265, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0084, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0129, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0100, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0098, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0137, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0149, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0106, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0081, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0097, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0111, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0103, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0140, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0023, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0105, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0126, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0114, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0100, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0158, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0092, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0131, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0126, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0104, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0108, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0094, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0169, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0091, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0108, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0111, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0097, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0019, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0227, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0110, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0169, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0093, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0154, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0141, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0085, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0085, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0106, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0011, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0090, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0128, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0090, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0135, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0096, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0080, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0139, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0088, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0113, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0083, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0142, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0174, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0089, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0085, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0090, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0100, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0112, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0085, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0080, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0164, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0125, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0167, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0108, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0152, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0003, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0106, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0105, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0123, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0082, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0140, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0080, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0109, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0137, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0117, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0139, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0118, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0122, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0175, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0145, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0106, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0092, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0085, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0102, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0088, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0191, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0104, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0135, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0110, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0100, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0109, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0208, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0130, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0096, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0111, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0085, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0180, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0086, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0181, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0158, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0150, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0100, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0159, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0106, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0130, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0080, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0155, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0155, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0133, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0149, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0185, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0105, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0136, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0083, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0132, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0081, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0133, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0097, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0092, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0110, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0080, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0086, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0133, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0105, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0082, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0183, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0084, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0122, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0126, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0087, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0012, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0235, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0166, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0149, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0130, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0017, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0118, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0136, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0256, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0082, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0097, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0129, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0115, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0092, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0103, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0185, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0170, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0017, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0119, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0167, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0128, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0152, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0136, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0104, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0093, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0102, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0193, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0107, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0141, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0086, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0152, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0110, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0129, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0162, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0142, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0101, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0131, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0100, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0101, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0143, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0161, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0126, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0081, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0160, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0179, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0111, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0157, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0284, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0226, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0131, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0101, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0254, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0102, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0110, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0004, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0140, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0100, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0151, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0086, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0004, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0087, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0111, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0143, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0022, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0149, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0141, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0108, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0156, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0089, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0140, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0080, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0205, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0100, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0109, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0230, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0153, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0121, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0160, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0175, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0019, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0122, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0153, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0098, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0165, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0110, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0112, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0018, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0213, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0230, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0180, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0098, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0216, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0081, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0084, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0096, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0101, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0083, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0021, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0211, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0138, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0133, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0142, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0107, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0090, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0093, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0136, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0133, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0201, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0095, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0120, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0128, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0095, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0125, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0130, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0119, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0109, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0005, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0112, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0091, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0013, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0141, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0142, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0174, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0097, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0162, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0021, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0171, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0089, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0228, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0130, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0101, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0197, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0140, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0180, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0084, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0096, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0111, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0168, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0116, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0211, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0083, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0218, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0086, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0127, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0092, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0100, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0098, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0110, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0085, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0123, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0168, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0085, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0115, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0151, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0207, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0099, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0100, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0159, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0092, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0099, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0134, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0167, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0124, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0114, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0113, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0102, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0129, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0085, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0163, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0105, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0103, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0126, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0093, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0114, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0256, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0088, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0110, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0118, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0181, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0140, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0127, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0023, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0139, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0135, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0097, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0205, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0138, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0112, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0096, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0137, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0139, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0129, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0101, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0142, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0175, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0008, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0024, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0116, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0165, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0112, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0082, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0090, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0090, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0091, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0160, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0161, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0152, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0231, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0132, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0081, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0122, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0113, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0233, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0119, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0163, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0086, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0119, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0211, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0139, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0114, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0133, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0097, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0119, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0255, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0127, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0084, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0117, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0082, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0157, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0016, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0181, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0023, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0111, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0089, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0086, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0252, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0142, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0014, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0083, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0106, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0171, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0081, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0146, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0125, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0171, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0144, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0209, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0154, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0094, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0008, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0128, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0241, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0102, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0133, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0084, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0184, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0203, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0156, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0201, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0160, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0252, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0091, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0110, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0090, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0082, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0021, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0009, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0170, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0158, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0167, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0094, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0104, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0122, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0147, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0115, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0080, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0090, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0089, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0091, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0097, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0017, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0120, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0167, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0173, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0182, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0118, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0107, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0175, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0143, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0091, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0020, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0152, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0161, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0105, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0088, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0154, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0133, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0122, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0140, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0112, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0227, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0153, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0124, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0093, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0160, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0150, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0128, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0116, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0098, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0169, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0098, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0145, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0112, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0117, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0124, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0107, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0091, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0003, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0115, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0151, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0125, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0159, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0089, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0085, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0094, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0144, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0249, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0126, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0174, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0013, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0107, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0208, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0115, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0104, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0021, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0112, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0110, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0113, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0003, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0177, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0130, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0086, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0135, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0111, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0167, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0164, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0087, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0106, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0127, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0247, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0114, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0185, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0085, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0184, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0146, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0106, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0126, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0150, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0107, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0267, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0137, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0101, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0086, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0162, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0109, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0168, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0138, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0158, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0205, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0111, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0116, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0124, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0176, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0106, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0129, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0165, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0159, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0136, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0155, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0107, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0097, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0148, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0098, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0165, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0237, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0148, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0125, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0102, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0153, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0130, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0096, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0104, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0174, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0103, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0090, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0135, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0224, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0186, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0125, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0102, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0188, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0117, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0198, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0129, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0130, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0082, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0176, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0081, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0094, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0004, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0170, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0117, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0106, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0133, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0002, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0196, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0137, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0086, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0114, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0139, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0261, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0156, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0120, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0175, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0167, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0019, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0114, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0146, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0122, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0084, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0088, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0270, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0086, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0154, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0145, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0132, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0153, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0002, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0010, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0138, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0115, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0199, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0160, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0184, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0204, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0103, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0111, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0123, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0083, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0091, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0130, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0144, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0127, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0186, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0144, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0016, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0016, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0213, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0273, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0082, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0089, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0110, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0013, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0113, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0138, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0084, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0133, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0095, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0019, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0096, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0111, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0151, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0087, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0173, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0020, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0115, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0119, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0002, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0083, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0084, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0117, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0142, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0153, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0136, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0002, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0080, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0083, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0113, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0117, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0143, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0094, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0119, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0100, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0115, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0112, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0002, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0117, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0148, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0111, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0120, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0136, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0126, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0127, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0112, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0093, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0132, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0237, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0117, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0109, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0125, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0104, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0088, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0143, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0142, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0126, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0080, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0112, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0133, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0226, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0089, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0104, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0124, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0151, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0220, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0085, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0124, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0137, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0095, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0094, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0162, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0102, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0080, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0106, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0098, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0142, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0134, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0108, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0080, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0013, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0096, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0113, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0106, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0094, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0211, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0112, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0116, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0146, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0159, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0143, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0110, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0117, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0081, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0171, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0114, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0104, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0193, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0201, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0230, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0145, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0181, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0120, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0108, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0096, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0146, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0122, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0188, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0100, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0194, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0151, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0123, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0130, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0087, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0137, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0131, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0086, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0147, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0129, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0138, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0082, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0168, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0015, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0164, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0101, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0212, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0170, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0123, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0006, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0160, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0110, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0100, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0133, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0173, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0137, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0105, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0133, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0099, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0174, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0101, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0087, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0098, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0109, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0152, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0114, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0086, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0090, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0187, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0196, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0149, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0100, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0099, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0100, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0195, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0139, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0134, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0145, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0146, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0161, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0120, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0124, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0173, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0132, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0165, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0113, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0082, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0117, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0013, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0134, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0082, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0111, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0158, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0138, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0087, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0095, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0137, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0133, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0203, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0086, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0186, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0152, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0109, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0096, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0089, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0104, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0098, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0167, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0226, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0146, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0217, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0127, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0174, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0130, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0138, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0184, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0082, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0103, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0155, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0162, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0115, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0094, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0132, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0131, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0125, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0181, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0093, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0123, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0137, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0205, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0128, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0005, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0165, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0136, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0151, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0113, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0130, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0130, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0197, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0170, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0140, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0118, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0106, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0109, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0237, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0117, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0160, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0092, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0147, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0091, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0084, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0088, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0150, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0141, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0098, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0127, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0149, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0168, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0181, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0111, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0161, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0096, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0094, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0135, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0093, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0096, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0095, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0116, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0125, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0166, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0122, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0096, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0086, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0149, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0095, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0088, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0108, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0127, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0133, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0128, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0164, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0113, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0137, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0141, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0121, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0112, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0150, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0125, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0151, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0020, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0205, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0138, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0119, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0096, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0101, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0160, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0104, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0092, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0154, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0140, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0181, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0127, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0099, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0116, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0128, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0094, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0151, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0195, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0222, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0115, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0093, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0152, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0098, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0108, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0085, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0119, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0179, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0082, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0174, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0091, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0118, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0169, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0134, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0115, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0015, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0098, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0146, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0120, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0124, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0131, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0085, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0140, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0124, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0100, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0093, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0127, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0155, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0168, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0123, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0080, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0151, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0015, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0153, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0104, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0173, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0289, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0103, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0144, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0159, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0097, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0082, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0111, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0135, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0085, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0140, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0129, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0153, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0091, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0218, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0110, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0119, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0110, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0139, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0214, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0160, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0161, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0097, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0082, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0154, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0215, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0178, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0143, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0148, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0191, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0097, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0103, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0180, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0022, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0138, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0109, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0137, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0148, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0091, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0183, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0101, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0186, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0120, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0202, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0100, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0124, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0158, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0103, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0098, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0125, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0169, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0116, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0141, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0110, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0183, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0098, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0088, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0126, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0090, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0119, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0125, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0157, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0192, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0103, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0169, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0085, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0148, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0097, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0130, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0085, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0106, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0101, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0129, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0098, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0130, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0132, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0080, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0093, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0110, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0106, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0137, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0103, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0085, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0083, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0080, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0132, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0087, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0113, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0094, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0107, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0121, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0106, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0190, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0083, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0108, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0089, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0081, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0093, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0123, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0165, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0102, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0082, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0118, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0132, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0096, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0096, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0098, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0090, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0123, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0122, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0108, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0094, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0022, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0082, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0124, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0128, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0143, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0131, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0093, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0104, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0092, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0085, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0174, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0099, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0104, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0121, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0139, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0118, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0109, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0136, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0087, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0099, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0024, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0091, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0124, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0131, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0161, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0126, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0117, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0132, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0157, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0109, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0144, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0113, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0114, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0158, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0082, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0145, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0110, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0101, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0091, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0104, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0093, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0115, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0107, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0086, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0119, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0080, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0150, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0123, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0172, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0093, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0132, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0107, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0169, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0119, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0153, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0143, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0082, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0097, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0102, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0171, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0102, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0109, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0105, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0133, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0012, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0102, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0083, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0119, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0090, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0081, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0186, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0094, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0113, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0124, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0083, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0122, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0098, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0170, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0127, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0208, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0096, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0093, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0181, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0116, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0144, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0112, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0098, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0102, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0145, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0090, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0085, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0119, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0080, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0146, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0111, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0100, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0120, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0171, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0142, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0133, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0128, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0112, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0095, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0126, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0091, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0170, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0096, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0080, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0127, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0097, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0128, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0139, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0087, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0180, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0092, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0122, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0101, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0127, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0096, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0168, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0129, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0263, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0099, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0083, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0087, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0091, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0156, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0085, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0114, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0084, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0105, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0086, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0107, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0098, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0100, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0155, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0085, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0119, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0090, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0105, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0129, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0108, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0126, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0213, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0192, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0105, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0128, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0108, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0011, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0095, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0104, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0095, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0101, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0149, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0182, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0144, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0188, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0108, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0171, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0169, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0100, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0179, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0097, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0097, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0107, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0100, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0086, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0086, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0140, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0087, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0081, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0091, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0106, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0124, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0131, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0104, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0121, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0187, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0133, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0094, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0151, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0142, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0109, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0192, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0086, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0206, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0099, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0101, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0156, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0209, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0091, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0159, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0118, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0099, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0124, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0100, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0100, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0105, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0128, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0110, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0113, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0275, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0119, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0151, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0182, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0099, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0120, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0195, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0126, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0117, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0086, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0219, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0130, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0116, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0158, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0147, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0103, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0110, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0090, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0148, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0120, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0112, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0160, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0093, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0112, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0180, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0153, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0090, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0097, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0106, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0172, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0120, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0127, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0085, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0106, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0113, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0117, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0138, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0136, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0083, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0114, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0117, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0154, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0082, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0126, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0120, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0118, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0117, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0095, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0083, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0265, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0116, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0086, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0175, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0122, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0173, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0115, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0161, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0121, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0136, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0119, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0120, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0152, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0111, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0134, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0084, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0096, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0187, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0087, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0012, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0107, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0165, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0150, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0155, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0172, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0115, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0130, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0093, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0186, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0114, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0147, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0103, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0083, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0104, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0103, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0117, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0100, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0136, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0176, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0179, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0083, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0024, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0089, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0155, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0119, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0081, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0119, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0083, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0117, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0090, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0116, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0121, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0132, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0138, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0187, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0191, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0111, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0131, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0131, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0080, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0181, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0144, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0167, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0136, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0109, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0102, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0099, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0107, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0165, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0171, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0166, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0160, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0089, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0120, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0081, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0153, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0111, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0151, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0107, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0090, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0108, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0191, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0091, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0123, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0126, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0084, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0082, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0146, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0080, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0162, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0087, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0098, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0144, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0094, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0149, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0133, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0109, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0147, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0123, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0123, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0099, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0089, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0133, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0105, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0108, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0127, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0092, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0193, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0105, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0108, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0082, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0121, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0124, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0104, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0147, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0088, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0117, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0118, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0130, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0144, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0105, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0175, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0122, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0176, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0130, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0134, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0154, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0139, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0161, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0085, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0131, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0126, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0163, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0092, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0126, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0080, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0084, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0138, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0087, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0175, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0104, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0099, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0154, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0101, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0202, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0191, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0188, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0134, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0139, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0148, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0106, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0022, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0083, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0092, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0117, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0145, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0122, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0228, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0097, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0107, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0103, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0117, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0082, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0211, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0101, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0102, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0110, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0098, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0130, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0153, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0110, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0115, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0160, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0152, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0108, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0163, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0207, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0084, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0136, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0093, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0178, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0102, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0091, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0089, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0091, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0092, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0118, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0101, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0112, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0101, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0096, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0156, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0109, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0091, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0104, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0155, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0096, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0088, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0095, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0081, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0136, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0141, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0096, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0104, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0091, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0136, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0109, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0105, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0080, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0094, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0093, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0094, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0141, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0108, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0161, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0177, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0085, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0158, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0103, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0112, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0141, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0152, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0121, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0101, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0093, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0082, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0148, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0149, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0098, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0090, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0112, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0098, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0099, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0109, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0110, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0197, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0093, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0137, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0105, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0018, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0116, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0102, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0180, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0089, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0107, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0172, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0100, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0089, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0208, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0220, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0114, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0113, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0130, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0086, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0123, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0148, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0093, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0114, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0212, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0016, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0166, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0135, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0185, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0115, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0105, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0121, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0236, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0106, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0126, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0160, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0098, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0116, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0082, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0093, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0127, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0086, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0193, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0116, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0089, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0131, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0107, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0096, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0164, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0096, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0104, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0149, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0099, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0160, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0171, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0131, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0115, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0090, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0009, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0140, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0081, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0093, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0114, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0141, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0124, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0094, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0092, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0117, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0144, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0089, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0087, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0161, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0185, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0132, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0115, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0097, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0166, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0105, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0129, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0145, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0092, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0098, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0092, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0166, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0088, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0092, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0148, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0083, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0155, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0089, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0165, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0091, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0083, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0112, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0091, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0175, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0140, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0128, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0109, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0084, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0087, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0088, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0138, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0117, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0137, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0103, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0152, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0087, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0092, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0184, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0115, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0108, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0101, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0159, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0081, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0135, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0083, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0084, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0090, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0081, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0120, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0161, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0093, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0211, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0232, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0088, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0080, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0102, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0146, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0141, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0149, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0139, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0146, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0107, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0128, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0175, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0137, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0141, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0176, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0092, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0100, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0100, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0106, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0086, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0113, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0136, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0097, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0181, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0136, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0112, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0128, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0095, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0171, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0141, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0174, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0082, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0190, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0095, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0100, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0143, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0177, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0091, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0119, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0101, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0089, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0136, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0086, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0103, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0009, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0083, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0093, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0106, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0158, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0125, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0114, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0122, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0095, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0115, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0154, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0147, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0139, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0092, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0139, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0133, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0141, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0122, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0105, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0088, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0097, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0105, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0197, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0084, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0097, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0099, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0134, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0083, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0100, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0145, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0109, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0119, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0118, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0082, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0126, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0092, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0095, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0108, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0086, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0100, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0148, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0145, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0083, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0091, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0104, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0121, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0108, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0123, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0116, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0123, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0097, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0090, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0125, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0170, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0105, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0123, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0103, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0122, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0083, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0180, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0138, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0096, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0109, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0134, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0174, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0110, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0108, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0216, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0115, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0107, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0212, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0108, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0217, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0081, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0176, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0095, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0129, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0148, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0095, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0097, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0091, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0109, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0125, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0097, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0122, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0109, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0138, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0083, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0092, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0094, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0092, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0137, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0181, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0098, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0090, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0142, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0100, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0111, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0093, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0119, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0085, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0089, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0143, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0106, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0137, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0138, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0146, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0115, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0090, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0119, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0161, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0013, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0089, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0118, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0095, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0113, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0189, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0137, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0104, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0146, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0173, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0122, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0103, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0130, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0126, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0084, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0128, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0112, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0185, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0126, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0100, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0092, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0199, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0119, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0118, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0118, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0107, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0324, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0090, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0090, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0121, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0094, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0103, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0133, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0152, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0105, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0117, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0092, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0100, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0119, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0103, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0111, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0119, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0088, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0105, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0097, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0142, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0129, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0145, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0151, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0081, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0133, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0101, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0136, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0085, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0166, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0168, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0169, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0171, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0162, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0121, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0091, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0125, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0086, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0104, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0092, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0125, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0176, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0143, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0098, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0097, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0119, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0158, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0120, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0108, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0125, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0124, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0091, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0204, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0090, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0111, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0113, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0151, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0141, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0081, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0080, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0101, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0094, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0158, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0081, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0083, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0092, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0117, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0096, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0094, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0092, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0087, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0116, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0098, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0094, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0202, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0092, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0115, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0090, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0087, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0103, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0165, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0098, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0223, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0165, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0116, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0129, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0150, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0093, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0118, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0124, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0083, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0137, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0244, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0098, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0086, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0139, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0092, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0156, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0095, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0110, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0108, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0122, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0143, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0086, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0094, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0104, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0096, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0097, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0089, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0164, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0093, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0114, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0087, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0156, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0216, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0130, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0100, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0124, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0125, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0094, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0134, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0142, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0179, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0116, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0123, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0133, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0094, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0020, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0186, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0088, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0101, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0093, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0106, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0098, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0100, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0014, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0110, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0155, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0106, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0090, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0121, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0090, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0090, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0092, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0179, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0110, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0133, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0169, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0124, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0099, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0183, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0147, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0126, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0175, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0131, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0109, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0194, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0169, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0089, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0085, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0140, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0090, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0083, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0091, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0105, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0124, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0106, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0133, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0115, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0120, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0146, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0091, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0139, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0098, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0129, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0117, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0175, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0081, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0111, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0169, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0121, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0148, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0101, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0149, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0082, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0123, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0098, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0131, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0120, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0121, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0014, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0197, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0098, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0093, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0097, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0108, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0094, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0118, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0127, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0205, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0153, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0107, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0144, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0146, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0136, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0152, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0099, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0102, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0111, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0082, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0120, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0118, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0132, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0108, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0111, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0131, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0148, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0133, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0203, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0157, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0093, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0128, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0101, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0081, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0138, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0095, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0120, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0147, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0085, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0101, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0088, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0154, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0080, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0104, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0116, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0123, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0083, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0210, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0148, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0164, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0085, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0116, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0102, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0090, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0096, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0087, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0018, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0171, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0136, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0128, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0103, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0138, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0155, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0104, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0124, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0096, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0161, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0095, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0087, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0114, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0140, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0194, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0115, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0099, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0116, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0087, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0210, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0120, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0083, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0116, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0086, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0099, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0159, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0140, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0188, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0086, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0095, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0023, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0131, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0098, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0086, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0101, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0124, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0085, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0104, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0089, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0096, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0107, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0092, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0095, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0097, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0112, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0109, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0142, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0092, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0169, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0128, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0103, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0189, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0093, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0161, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0155, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0109, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0133, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0110, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0136, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0135, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0111, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0127, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0084, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0021, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0091, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0138, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0091, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0093, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0129, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0099, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0092, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0180, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0096, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0128, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0110, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0092, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0123, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0108, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0090, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0109, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0212, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0194, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0015, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0111, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0082, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0123, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0143, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0090, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0117, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0091, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0144, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0099, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0090, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0106, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0083, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0098, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0228, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0095, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0115, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0095, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0088, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0091, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0148, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0187, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0103, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0188, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0130, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0095, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0153, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0090, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0121, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0081, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0101, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0108, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0093, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0102, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0110, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0103, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0184, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0137, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0116, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0098, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0085, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0154, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0129, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0151, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0152, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0141, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0129, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0106, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0097, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0137, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0098, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0107, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0107, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0083, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0138, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0156, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0088, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0131, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0112, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0124, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0118, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0157, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0185, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0094, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0099, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0087, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0022, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0192, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0090, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0084, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0119, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0086, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0095, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0109, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0112, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0021, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0089, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0086, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0245, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0183, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0088, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0123, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0080, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0244, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0148, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0111, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0122, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0087, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0102, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0094, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0085, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0087, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0013, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0159, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0080, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0127, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0107, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0100, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0108, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0180, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0142, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0156, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0082, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0099, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0156, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0134, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0139, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0125, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0142, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0132, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0091, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0138, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0135, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0134, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0146, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0023, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0102, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0183, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0101, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0125, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0126, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0102, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0162, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0101, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0112, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0147, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0158, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0085, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0092, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0253, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0088, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0137, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0118, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0179, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0080, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0097, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0124, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0096, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0083, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0084, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0102, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0024, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0096, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0082, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0096, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0183, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0112, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0111, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0132, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0133, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0125, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0096, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0097, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0105, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0171, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0103, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0137, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0023, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0094, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0085, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0095, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0168, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0093, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0184, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0179, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0094, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0135, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0143, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0083, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0094, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0128, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0082, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0093, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0148, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0166, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0091, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0116, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0101, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0153, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0152, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0095, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0135, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0123, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0092, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0188, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0114, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0096, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0085, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0098, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0100, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0148, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0096, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0117, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0107, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0107, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0086, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0082, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0098, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0118, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0128, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0086, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0143, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0118, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0095, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0126, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0106, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0119, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0127, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0140, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0130, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0127, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0141, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0160, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0122, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0115, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0207, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0146, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0141, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0092, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0099, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0089, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0138, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0088, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0090, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0192, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0117, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0095, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0121, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0157, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0183, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0138, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0147, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0130, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0118, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0107, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0134, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0107, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0145, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0085, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0176, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0126, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0015, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0140, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0120, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0104, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0163, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0124, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0092, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0132, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0091, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0166, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0018, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0092, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0095, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0132, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0111, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0104, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0129, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0177, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0095, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0141, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0091, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0091, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0170, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0085, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0086, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0135, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0104, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0160, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0118, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0109, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0115, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0119, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0158, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0111, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0110, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0124, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0197, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0138, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0109, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0111, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0132, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0136, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0207, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0082, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0179, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0103, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0083, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0131, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0194, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0153, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0126, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0160, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0090, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0100, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0093, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0184, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0098, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0182, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0095, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0108, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0103, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0131, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0164, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0158, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0147, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0166, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0113, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0090, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0119, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0097, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0131, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0172, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0083, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0105, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0082, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0125, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0091, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0193, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0151, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0080, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0168, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0215, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0081, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0127, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0120, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0164, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0145, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0149, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0115, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0088, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0098, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0100, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0111, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0084, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0125, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0098, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0123, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0099, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0092, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0097, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0081, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0119, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0123, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0128, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0081, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0142, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0096, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0153, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0170, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0163, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0128, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0148, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0083, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0117, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0141, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0141, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0130, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0097, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0158, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0149, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0091, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0130, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0104, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0080, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0158, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0100, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0131, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0118, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0153, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0132, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0181, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0118, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0121, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0126, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0104, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0166, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0124, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0112, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0173, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0104, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0107, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0160, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0162, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0121, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0123, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0094, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0145, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0129, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0134, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0082, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0086, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0138, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0081, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0110, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0094, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0090, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0083, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0197, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0113, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0211, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0107, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0120, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0114, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0110, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0113, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0083, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0101, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0146, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0134, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0122, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0117, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0080, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0105, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0090, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0134, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0094, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0181, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0095, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0116, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0112, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0084, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0102, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0112, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0092, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0131, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0089, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0122, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0108, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0139, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0107, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0097, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0124, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0110, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0091, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0093, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0081, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0130, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0087, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0139, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0168, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0148, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0136, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0268, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0088, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0163, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0117, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0083, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0109, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0114, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0096, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0080, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0153, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0134, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0091, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0091, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0108, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0144, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0118, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0103, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0117, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0096, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0081, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0090, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0081, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0102, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0148, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0108, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0081, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0136, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0156, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0115, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0093, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0156, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0146, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0107, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0138, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0130, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0097, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0122, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0088, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0153, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0109, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0121, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0115, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0091, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0193, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0087, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0126, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0102, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0080, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0100, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0164, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0137, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0150, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0119, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0148, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0092, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0152, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0084, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0169, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0135, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0116, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0152, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0164, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0137, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0084, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0176, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0137, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0115, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0139, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0104, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0086, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0124, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0081, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0133, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0152, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0165, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0118, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0086, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0163, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0125, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0104, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0139, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0132, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0099, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0158, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0083, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0146, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0194, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0121, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0094, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0136, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0135, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0337, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0165, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0108, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0096, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0137, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0091, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0114, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0098, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0136, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0112, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0137, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0157, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0082, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0103, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0122, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0163, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0195, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0109, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0095, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0092, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0209, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0163, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0106, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0082, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0087, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0260, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0088, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0107, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0134, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0176, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0103, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0129, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0120, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0092, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0115, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0085, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0105, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0080, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0092, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0113, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0100, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0085, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0105, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0145, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0091, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0137, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0108, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0126, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0146, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0117, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0107, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0164, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0190, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0164, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0099, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0088, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0143, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0087, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0104, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0090, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0135, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0113, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0089, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0145, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0140, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0114, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0165, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0106, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0086, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0092, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0121, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0084, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0145, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0099, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0172, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0083, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0106, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0140, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0157, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0105, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0127, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0081, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0085, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0101, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0154, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0102, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0093, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0085, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0103, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0117, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0116, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0161, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0083, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0155, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0128, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0173, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0104, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0115, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0190, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0104, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0093, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0085, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0097, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0110, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0148, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0080, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0122, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0105, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0166, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0139, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0095, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0124, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0095, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0128, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0091, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0171, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0149, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0138, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0120, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0150, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0086, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0097, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0094, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0104, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0135, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0181, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0120, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0082, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0111, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0103, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0137, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0106, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0141, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0088, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0088, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0099, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0160, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0115, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0116, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0083, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0099, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0091, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0166, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0107, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0103, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0114, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0090, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0121, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0155, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0164, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0082, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0103, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0128, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0127, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0101, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0122, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0118, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0123, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0098, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0011, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0085, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0117, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0109, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0172, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0131, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0181, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0100, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0120, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0104, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0240, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0157, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0095, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0116, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0093, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0082, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0098, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0103, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0086, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0102, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0112, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0105, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0111, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0169, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0085, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0139, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0109, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0217, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0085, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0168, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0097, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0129, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0091, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0175, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0120, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0118, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0166, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0090, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0086, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0167, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0158, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0092, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0160, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0125, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0131, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0087, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0116, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0098, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0114, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0191, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0100, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0153, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0085, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0108, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0121, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0080, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0131, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0087, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0088, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0110, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0146, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0089, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0123, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0104, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0269, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0103, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0118, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0100, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0082, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0145, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0204, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0086, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0087, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0095, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0129, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0090, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0081, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0104, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0109, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0135, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0182, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0123, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0145, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0116, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0088, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0080, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0098, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0146, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0088, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0082, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0106, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0115, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0112, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0142, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0152, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0232, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0121, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0117, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0087, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0099, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0099, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0186, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0108, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0089, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0132, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0093, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0082, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0138, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0085, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0119, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0107, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0087, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0084, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0115, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0099, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0118, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0135, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0157, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0157, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0105, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0104, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0089, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0146, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0120, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0104, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0021, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0102, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0023, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0095, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0085, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0131, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0132, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0200, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0130, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0102, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0119, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0090, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0169, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0098, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0116, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0097, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0145, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0105, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0121, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0168, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0104, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0144, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0200, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0118, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0111, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0102, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0098, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0092, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0172, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0082, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0082, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0180, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0150, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0093, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0134, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0107, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0090, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0125, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0082, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0090, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0091, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0099, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0122, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0167, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0096, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0083, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0087, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0089, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0092, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0122, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0151, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0121, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0109, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0201, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0087, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0087, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0124, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0099, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0109, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0090, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0101, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0103, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0093, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0123, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0177, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0127, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0179, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0101, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0217, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0087, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0089, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0119, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0146, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0110, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0097, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0093, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0083, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0138, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0117, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0165, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0167, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0113, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0125, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0082, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0101, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0093, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0129, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0109, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0082, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0108, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0109, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0098, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0142, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0162, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0212, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0132, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0110, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0086, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0131, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0085, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0110, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0126, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0084, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0160, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0100, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0102, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0097, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0094, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0104, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0104, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0154, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0119, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0129, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0117, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0125, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0112, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0087, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0100, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0130, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0142, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0090, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0092, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0123, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0097, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0083, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0098, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0167, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0115, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0082, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0109, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0121, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0101, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0105, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0111, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0105, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0146, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0133, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0160, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0161, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0174, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0159, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0090, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0139, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0101, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0138, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0141, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0129, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0093, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0094, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0165, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0081, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0120, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0087, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0089, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0087, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0098, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0021, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0147, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0100, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0081, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0084, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0098, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0109, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0088, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0126, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0094, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0090, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0096, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0108, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0168, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0016, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0131, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0097, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0141, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0138, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0108, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0096, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0088, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0104, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0138, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0104, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0098, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0100, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0115, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0087, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0117, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0155, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0107, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0097, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0129, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0107, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0080, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0108, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0153, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0098, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0085, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0155, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0121, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0164, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0160, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0214, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0108, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0101, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0081, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0121, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0193, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0124, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0098, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0090, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0140, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0138, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0138, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0103, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0137, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0122, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0149, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0087, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0129, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0103, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0122, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0166, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0086, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0135, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0116, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0109, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0147, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0125, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0098, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0157, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0084, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0137, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0083, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0121, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0141, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0109, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0128, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0127, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0104, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0135, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0086, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0101, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0137, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0123, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0155, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0110, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0117, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0096, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0123, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0111, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0081, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0104, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0119, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0094, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0186, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0153, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0207, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0103, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0173, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0109, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0087, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0089, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0083, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0093, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0086, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0100, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0107, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0092, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0116, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0143, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0102, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0107, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0114, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0093, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0156, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0104, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0152, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0083, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0102, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0120, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0102, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0125, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0100, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0088, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0130, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0094, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0104, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0087, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0101, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0107, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0113, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0082, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0127, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0094, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0135, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0148, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0185, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0124, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0093, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0097, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0081, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0113, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0212, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0022, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0097, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0139, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0081, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0099, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0097, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0087, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0173, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0101, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0090, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0080, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0103, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0091, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0160, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0132, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0153, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0148, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0126, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0149, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0158, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0098, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0138, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0128, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0110, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0120, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0088, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0092, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0090, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0131, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0095, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0117, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0092, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0220, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0198, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0098, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0120, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0091, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0086, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0091, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0116, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0165, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0144, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0106, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0116, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0084, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0135, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0104, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0112, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0105, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0099, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0151, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0149, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0087, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0210, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0113, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0107, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0092, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0123, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0178, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0084, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0125, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0120, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0095, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0098, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0138, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0165, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0091, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0158, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0095, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0217, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0122, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0080, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0113, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0092, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0107, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0124, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0097, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0112, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0130, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0090, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0100, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0190, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0116, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0181, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0113, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0134, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0131, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0110, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0086, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0104, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0119, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0124, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0096, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0120, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0115, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0133, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0135, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0084, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0081, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0135, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0096, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0095, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0186, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0085, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0085, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0155, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0112, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0105, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0110, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0166, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0211, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0108, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0101, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0146, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0137, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0081, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0088, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0118, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0099, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0103, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0125, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0103, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0129, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0128, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0114, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0094, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0015, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0107, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0104, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0097, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0105, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0129, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0085, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0095, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0098, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0133, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0110, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0098, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0106, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0138, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0097, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0104, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0099, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0126, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0094, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0126, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0117, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0117, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0182, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0100, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0084, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0148, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0024, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0082, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0112, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0096, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0116, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0132, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0090, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0117, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0156, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0089, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0085, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0178, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0098, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0136, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0094, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0137, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0127, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0198, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0095, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0126, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0119, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0127, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0132, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0082, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0113, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0023, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0115, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0111, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0116, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0131, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0082, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0111, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0081, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0098, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0124, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0090, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0092, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0175, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0090, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0164, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0090, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0131, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0161, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0114, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0083, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0106, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0135, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0156, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0127, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0091, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0125, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0113, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0085, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0100, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0132, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0090, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0117, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0021, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0080, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0157, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0113, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0083, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0102, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0121, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0132, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0122, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0094, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0087, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0182, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0082, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0021, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0107, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0111, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0081, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0158, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0097, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0110, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0085, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0100, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0081, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0135, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0083, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0090, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0099, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0106, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0141, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0101, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0128, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0089, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0117, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0130, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0018, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0125, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0176, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0088, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0148, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0145, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0083, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0100, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0088, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0122, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0128, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0101, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0109, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0228, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0114, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0119, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0087, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0085, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0101, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0131, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0085, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0105, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0174, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0100, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0123, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0153, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0103, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0097, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0094, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0104, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0024, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0128, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0136, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0090, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0160, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0151, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0095, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0121, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0087, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0167, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0097, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0132, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0091, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0128, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0110, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0100, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0143, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0104, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0105, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0092, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0135, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0094, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0106, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0095, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0120, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0159, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0124, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0137, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0105, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0088, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0119, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0134, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0091, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0132, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0084, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0091, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0024, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0121, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0084, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0109, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0108, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0104, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0110, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0097, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0107, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0106, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0094, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0116, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0119, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0094, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0106, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0133, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0121, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0158, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0081, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0086, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0103, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0109, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0127, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0133, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0110, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0088, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0142, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0087, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0113, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0082, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0124, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0135, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0119, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0113, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0161, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0153, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0270, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0012, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0091, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0095, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0087, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0129, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0117, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0091, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0091, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0087, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0102, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0111, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0134, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0152, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0114, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0108, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0178, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0228, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0188, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0021, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0152, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0095, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0099, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0152, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0092, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0104, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0102, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0110, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0097, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0166, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0080, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0113, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0102, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0240, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0133, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0120, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0085, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0109, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0124, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0096, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0137, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0151, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0103, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0153, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0094, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0082, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0110, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0183, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0143, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0103, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0020, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0098, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0119, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0113, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0131, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0113, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0117, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0161, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0112, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0090, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0099, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0085, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0082, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0132, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0153, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0084, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0169, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0144, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0158, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0088, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0109, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0230, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0098, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0113, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0080, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0128, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0104, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0110, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0124, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0142, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0094, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0123, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0112, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0098, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0089, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0141, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0096, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0107, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0118, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0140, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0134, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0011, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0087, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0124, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0096, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0095, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0096, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0085, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0086, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0139, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0124, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0122, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0020, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0098, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0181, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0127, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0083, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0191, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0127, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0113, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0126, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0158, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0080, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0082, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0118, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0116, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0087, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0128, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0156, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0155, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0016, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0155, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0086, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0081, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0086, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0123, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0083, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0024, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0186, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0109, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0103, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0087, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0151, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0112, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0115, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0094, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0188, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0120, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0109, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0098, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0104, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0086, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0192, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0187, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0104, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0102, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0146, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0081, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0096, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0121, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0150, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0123, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0096, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0082, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0087, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0131, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0100, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0159, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0097, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0111, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0013, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0128, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0114, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0092, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0085, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0153, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0090, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0102, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0018, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0153, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0178, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0142, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0139, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0086, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0106, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0098, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0109, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0223, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0119, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0086, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0127, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0080, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0139, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0299, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0108, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0112, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0086, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0127, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0105, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0173, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0108, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0124, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0126, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0086, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0023, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0101, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0122, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0100, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0084, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0118, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0138, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0083, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0137, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0123, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0159, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0024, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0098, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0111, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0099, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0088, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0094, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0085, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0113, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0023, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0093, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0096, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0102, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0123, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0145, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0101, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0107, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0088, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0101, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0154, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0099, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0085, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0106, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0103, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0081, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0115, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0156, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0098, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0088, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0085, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0108, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0086, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0085, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0136, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0129, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0107, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0082, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0156, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0103, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0101, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0082, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0147, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0091, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0091, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0178, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0128, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0099, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0084, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0100, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0080, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0117, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0279, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0103, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0189, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0182, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0107, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0275, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0086, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0149, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0158, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0107, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0093, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0127, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0084, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0113, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0104, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0088, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0169, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0141, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0083, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0091, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0140, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0097, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0122, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0084, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0100, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0115, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0082, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0086, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0144, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0101, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0089, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0087, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0156, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0091, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0114, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0082, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0080, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0099, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0112, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0121, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0088, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0125, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0159, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0116, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0019, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0082, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0099, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0163, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0085, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0120, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0104, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0124, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0115, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0110, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0117, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0166, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0085, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0110, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0143, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0113, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0137, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0084, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0086, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0119, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0265, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0174, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0093, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0137, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0084, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0183, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0119, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0081, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0085, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0112, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0109, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0162, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0089, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0204, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0122, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0094, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0084, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0098, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0135, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0087, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0107, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0131, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0103, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0122, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0084, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0098, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0106, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0089, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0096, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0087, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0158, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0101, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0176, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0110, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0098, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0095, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0095, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0116, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0099, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0179, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0015, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0143, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0140, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0131, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0090, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0159, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0143, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0171, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0166, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0100, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0080, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0104, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0095, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0102, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0132, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0084, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0158, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0149, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0090, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0099, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0151, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0091, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0194, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0162, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0100, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0134, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0100, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0123, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0177, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0093, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0110, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0206, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0106, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0098, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0126, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0105, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0140, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0087, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0138, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0185, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0136, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0184, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0090, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0099, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0144, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0121, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0125, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0176, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0091, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0019, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0105, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0081, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0095, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0158, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0085, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0253, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0154, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0150, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0214, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0228, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0091, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0085, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0084, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0114, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0021, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0096, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0232, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0094, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0122, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0146, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0007, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0086, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0120, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0107, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0013, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0111, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0120, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0130, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0080, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0172, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0113, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0090, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0081, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0151, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0135, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0086, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0111, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0156, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0170, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0093, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0101, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0165, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0080, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0147, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0012, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0024, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0094, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0094, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0154, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0020, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0103, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0125, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0150, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0092, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0083, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0105, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0165, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0089, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0094, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0104, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0129, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0088, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0085, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0123, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0188, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0083, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0108, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0092, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0082, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0139, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0107, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0021, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0108, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0146, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0089, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0162, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0133, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0151, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0101, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0088, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0114, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0127, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0018, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0103, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0084, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0100, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0101, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0086, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0117, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0102, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0088, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0092, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0108, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0137, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0114, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0114, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0104, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0103, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0160, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0084, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0125, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0081, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0097, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0106, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0012, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0150, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0117, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0024, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0110, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0085, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0130, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0099, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0162, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0081, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0133, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0140, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0112, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0124, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0116, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0089, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0111, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0124, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0116, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0100, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0168, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0024, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0085, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0094, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0151, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0256, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0108, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0086, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0083, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0134, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0080, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0084, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0139, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0104, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0102, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0083, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0082, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0098, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0090, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0136, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0090, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0157, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0104, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0112, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0264, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0087, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0128, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0088, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0088, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0086, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0092, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0128, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0081, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0114, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0084, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0117, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0082, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0085, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0087, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0120, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0085, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0098, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0111, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0102, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0080, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0021, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0201, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0105, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0082, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0100, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0129, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0179, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0189, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0081, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0120, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0161, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0085, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0121, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0123, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0127, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0091, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0186, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0151, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0112, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0117, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0091, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0171, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0184, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0151, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0097, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0086, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0119, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0184, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0155, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0130, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0154, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0122, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0096, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0116, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0095, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0081, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0096, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0220, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0018, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0103, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0124, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0122, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0154, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0136, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0160, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0152, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0091, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0107, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0165, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0024, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0091, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0220, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0229, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0082, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0086, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0104, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0089, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0120, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0081, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0107, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0117, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0088, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0097, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0086, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0090, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0165, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0084, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0084, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0158, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0119, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0094, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0129, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0134, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0125, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0109, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0152, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0117, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0127, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0100, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0144, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0090, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0080, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0142, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0140, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0093, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0087, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0105, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0101, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0097, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0158, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0124, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0104, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0161, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0175, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0152, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0081, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0125, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0080, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0109, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0126, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0094, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0089, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0108, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0153, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0123, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0142, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0094, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0101, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0095, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0132, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0154, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0084, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0083, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0095, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0103, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0143, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0100, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0095, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0088, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0086, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0096, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0231, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0099, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0104, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0101, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0087, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0115, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0127, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0097, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0081, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0127, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0124, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0128, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0156, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0117, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0081, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0147, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0112, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0103, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0086, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0083, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0088, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0123, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0111, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0106, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0112, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0150, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0138, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0094, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0081, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0091, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0011, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0140, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0125, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0094, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0104, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0108, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0141, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0084, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0145, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0158, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0096, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0103, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0103, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0085, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0098, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0083, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0147, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0083, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0157, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0081, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0120, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0132, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0080, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0117, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0108, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0143, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0213, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0135, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0188, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0086, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0120, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0128, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0125, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0094, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0022, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0229, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0081, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0081, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0104, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0018, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0119, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0115, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0082, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0204, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0080, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0022, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0131, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0086, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0123, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0161, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0176, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0081, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0098, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0080, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0115, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0183, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0102, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0095, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0087, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0109, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0091, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0161, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0222, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0113, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0127, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0121, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0109, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0091, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0100, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0093, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0100, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0110, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0097, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0087, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0097, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0097, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0140, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0157, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0134, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0091, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0088, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0089, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0085, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0086, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0121, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0113, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0214, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0137, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0023, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0155, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0125, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0090, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0150, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0121, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0021, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0111, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0105, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0084, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0129, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0016, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0140, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0124, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0091, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0143, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0118, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0122, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0135, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0089, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0160, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0080, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0193, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0091, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0122, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0163, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0086, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0136, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0082, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0140, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0117, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0110, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0094, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0154, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0088, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0143, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0134, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0128, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0081, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0110, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0177, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0100, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0106, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0129, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0128, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0125, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0109, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0117, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0131, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0096, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0085, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0089, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0096, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0093, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0121, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0123, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0125, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0096, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0085, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0084, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0102, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0155, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0080, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0083, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0020, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0114, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0097, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0118, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0082, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0092, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0100, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0121, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0092, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0091, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0086, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0088, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0106, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0193, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0127, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0126, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0134, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0102, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0141, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0081, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0141, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0146, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0094, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0164, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0127, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0084, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0113, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0086, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0089, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0124, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0085, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0088, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0090, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0096, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0115, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0153, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0081, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0143, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0148, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0097, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0194, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0096, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0091, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0101, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0087, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0122, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0103, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0163, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0080, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0086, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0118, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0089, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0144, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0094, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0082, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0151, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0110, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0117, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0095, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0083, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0127, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0106, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0140, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0134, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0097, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0112, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0082, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0143, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0092, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0127, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0163, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0124, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0177, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0204, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0158, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0137, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0141, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0085, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0145, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0157, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0092, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0125, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0115, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0100, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0130, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0119, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0114, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0152, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0091, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0142, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0144, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0150, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0089, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0080, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0096, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0127, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0101, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0083, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0131, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0172, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0138, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0091, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0096, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0083, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0151, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0105, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0200, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0106, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0082, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0085, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0142, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0104, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0113, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0100, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0152, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0105, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0107, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0085, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0081, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0019, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0105, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0016, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0123, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0084, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0088, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0102, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0151, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0215, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0116, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0128, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0082, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0089, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0139, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0110, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0144, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0019, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0139, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0112, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0113, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0197, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0090, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0120, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0088, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0128, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0095, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0090, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0109, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0113, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0090, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0086, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0090, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0126, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0096, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0133, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0024, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0142, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0148, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0106, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0094, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0184, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0120, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0118, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0083, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0092, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0108, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0109, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0141, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0101, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0144, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0135, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0116, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0134, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0210, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0151, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0095, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0095, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0149, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0104, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0087, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0117, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0121, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0145, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0123, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0098, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0130, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0107, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0119, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0080, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0122, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0117, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0084, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0080, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0100, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0093, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0081, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0116, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0096, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0091, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0232, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0104, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0185, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0100, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0097, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0081, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0085, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0106, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0090, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0082, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0159, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0086, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0095, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0081, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0177, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0111, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0133, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0096, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0145, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0111, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0134, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0106, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0113, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0103, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0096, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0130, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0160, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0096, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0081, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0114, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0093, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0088, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0104, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0107, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0139, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0106, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0092, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0081, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0095, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0099, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0082, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0093, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0099, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0085, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0095, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0168, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0108, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0219, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0097, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0094, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0166, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0084, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0139, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0138, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0083, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0131, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0118, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0138, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0093, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0095, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0165, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0141, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0100, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0133, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0111, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0086, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0139, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0104, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0171, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0250, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0094, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0082, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0145, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0089, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0084, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0098, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0100, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0099, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0106, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0087, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0101, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0157, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0139, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0114, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0096, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0087, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0145, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0082, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0138, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0190, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0087, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0148, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0152, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0145, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0094, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0112, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0156, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0110, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0093, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0155, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0109, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0125, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0138, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0146, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0178, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0088, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0112, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0133, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0100, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0087, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0105, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0089, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0095, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0161, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0094, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0096, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0087, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0081, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0121, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0095, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0132, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0087, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0085, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0144, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0140, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0195, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0097, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0098, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0130, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0135, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0097, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0112, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0093, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0102, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0093, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0124, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0097, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0133, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0108, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0012, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0104, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0086, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0096, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0141, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0154, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0119, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0121, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0085, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0083, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0096, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0090, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0142, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0158, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0136, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0090, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0169, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0103, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0101, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0107, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0101, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0144, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0116, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0099, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0089, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0111, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0218, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0148, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0127, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0086, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0119, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0107, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0170, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0107, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0127, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0123, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0122, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0096, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0142, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0095, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0153, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0161, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0121, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0097, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0086, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0096, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0086, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0174, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0119, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0133, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0090, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0086, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0150, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0126, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0084, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0100, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0124, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0089, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0151, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0114, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0164, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0125, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0123, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0116, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0113, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0115, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0084, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0109, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0099, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0112, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0143, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0127, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0081, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0099, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0083, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0083, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0103, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0123, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0102, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0121, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0124, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0091, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0093, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0081, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0149, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0091, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0151, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0131, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0093, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0111, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0081, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0091, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0098, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0140, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0084, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0096, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0094, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0121, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0117, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0129, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0132, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0170, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0113, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0159, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0098, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0090, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0107, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0118, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0109, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0088, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0097, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0100, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0112, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0092, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0084, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0194, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0019, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0085, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0119, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0124, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0082, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0098, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0081, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0098, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0122, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0024, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0103, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0156, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0096, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0135, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0090, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0083, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0107, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0180, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0089, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0112, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0104, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0102, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0112, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0116, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0090, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0153, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0094, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0102, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0088, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0105, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0130, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0087, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0102, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0166, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0162, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0093, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0103, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0111, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0121, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0084, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0111, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0108, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0153, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0129, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0128, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0109, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0195, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0110, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0138, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0100, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0149, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0129, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0104, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0134, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0094, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0093, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0087, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0091, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0153, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0023, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0132, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0083, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0140, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0107, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0081, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0141, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0128, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0094, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0131, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0136, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0095, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0102, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0083, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0084, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0131, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0126, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0129, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0113, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0109, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0101, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0096, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0080, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0112, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0097, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0082, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0091, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0080, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0105, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0140, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0155, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0088, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0116, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0134, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0086, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0093, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0114, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0153, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0098, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0134, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0139, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0088, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0083, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0092, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0149, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0103, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0090, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0168, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0081, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0096, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0163, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0097, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0134, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0138, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0086, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0132, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0099, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0161, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0179, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0082, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0097, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0104, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0082, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0086, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0122, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0110, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0087, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0115, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0137, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0124, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0089, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0120, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0169, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0100, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0135, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0082, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0126, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0133, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0101, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0082, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0107, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0086, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0113, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0198, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0140, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0155, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0099, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0130, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0103, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0089, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0085, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0174, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0127, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0101, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0110, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0109, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0081, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0081, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0109, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0085, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0109, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0120, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0085, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0140, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0095, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0108, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0107, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0015, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0086, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0106, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0084, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0151, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0118, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0131, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0087, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0091, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0083, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0080, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0128, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0106, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0146, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0117, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0081, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0096, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0101, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0127, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0081, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0088, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0126, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0149, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0118, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0081, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0092, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0107, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0180, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0099, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0095, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0197, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0115, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0080, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0136, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0093, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0095, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0136, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0125, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0122, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0092, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0099, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0089, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0125, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0101, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0090, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0114, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0103, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0089, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0120, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0097, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0131, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0094, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0127, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0092, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0100, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0098, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0123, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0080, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0083, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0135, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0170, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0089, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0107, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0090, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0080, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0102, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0163, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0212, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0084, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0119, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0012, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0109, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0082, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0099, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0103, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0018, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0085, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0103, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0130, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0104, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0123, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0097, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0089, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0113, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0082, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0112, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0109, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0118, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0146, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0098, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0200, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0125, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0081, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0084, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0083, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0195, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0092, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0159, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0116, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0102, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0081, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0112, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0113, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0105, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0123, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0103, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0094, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0103, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0122, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0112, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0126, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0244, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0102, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0126, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0102, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0112, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0096, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0093, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0142, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0207, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0161, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0132, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0107, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0099, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0117, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0146, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0103, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0191, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0095, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0112, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0085, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0096, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0099, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0092, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0097, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0104, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0106, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0082, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0195, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0167, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0128, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0111, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0112, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0132, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0162, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0121, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0113, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0170, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0120, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0117, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0099, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0118, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0106, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0087, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0101, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0088, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0090, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0179, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0098, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0081, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0101, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0181, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0143, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0083, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0149, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0081, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0095, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0100, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0092, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0104, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0083, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0132, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0102, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0088, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0089, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0104, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0085, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0096, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0134, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0022, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0085, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0112, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0119, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0110, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0167, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0147, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0099, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0193, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0135, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0170, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0145, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0109, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0126, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0080, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0084, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0095, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0084, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0123, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0123, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0132, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0124, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0080, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0080, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0118, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0115, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0107, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0138, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0096, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0110, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0142, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0192, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0124, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0103, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0140, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0082, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0105, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0118, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0160, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0091, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0149, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0096, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0142, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0091, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0091, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0107, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0110, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0105, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0126, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0084, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0090, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0082, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0181, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0081, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0160, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0166, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0096, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0085, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0112, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0109, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0087, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0127, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0132, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0082, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0091, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0086, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0089, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0107, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0110, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0084, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0122, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0094, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0090, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0080, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0111, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0113, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0110, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0100, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0106, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0094, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0083, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0135, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0100, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0118, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0095, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0085, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0202, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0103, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0119, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0131, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0137, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0119, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0095, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0099, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0021, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0094, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0141, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0081, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0103, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0109, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0089, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0117, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0089, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0091, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0147, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0080, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0093, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0100, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0101, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0113, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0094, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0106, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0089, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0162, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0150, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0134, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0091, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0124, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0099, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0126, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0137, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0164, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0095, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0210, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0081, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0086, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0107, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0111, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0099, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0086, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0117, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0093, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0101, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0087, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0106, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0103, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0198, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0095, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0176, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0188, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0151, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0097, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0148, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0146, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0092, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0118, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0116, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0099, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0122, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0083, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0119, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0098, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0081, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0191, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0096, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0095, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0173, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0174, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0121, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0088, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0114, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0086, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0172, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0125, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0134, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0104, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0158, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0136, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0082, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0117, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0104, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0107, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0118, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0106, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0105, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0096, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0093, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0124, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0116, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0082, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0099, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0130, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0233, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0098, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0085, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0121, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0132, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0083, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0087, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0102, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0094, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0113, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0135, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0094, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0085, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0108, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0142, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0086, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0088, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0020, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0095, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0089, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0149, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0106, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0122, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0089, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0095, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0083, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0096, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0123, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0133, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0085, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0080, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0097, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0080, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0023, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0091, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0221, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0083, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0139, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0176, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0118, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0119, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0086, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0234, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0118, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0146, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0108, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0087, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0084, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0105, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0162, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0131, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0156, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0156, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0080, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0105, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0143, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0120, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0144, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0088, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0132, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0113, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0133, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0109, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0089, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0138, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0102, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0089, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0119, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0096, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0096, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0099, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0148, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0103, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0106, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0096, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0097, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0124, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0110, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0085, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0082, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0108, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0157, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0087, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0140, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0093, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0134, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0096, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0102, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0082, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0098, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0109, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0082, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0102, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0130, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0123, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0163, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0083, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0126, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0089, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0099, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0086, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0131, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0115, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0097, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0103, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0092, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0093, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0117, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0088, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0125, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0094, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0127, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0165, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0094, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0124, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0084, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0087, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0200, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0131, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0154, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0120, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0133, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0086, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0159, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0105, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0098, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0160, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0134, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0084, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0081, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0137, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0086, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0102, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0080, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0143, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0099, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0090, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0128, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0149, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0087, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0122, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0103, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0081, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0096, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0086, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0022, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0091, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0108, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0021, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0153, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0018, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0132, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0132, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0168, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0174, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0095, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0080, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0092, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0081, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0122, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0143, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0085, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0085, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0087, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0088, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0110, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0098, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0105, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0103, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0124, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0145, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0020, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0023, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0092, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0094, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0154, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0089, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0151, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0117, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0091, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0123, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0089, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0101, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0188, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0136, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0134, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0112, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0085, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0084, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0090, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0115, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0022, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0083, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0098, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0181, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0166, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0097, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0023, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0089, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0091, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0101, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0112, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0083, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0120, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0107, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0022, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0115, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0091, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0134, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0093, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0080, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0158, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0105, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0080, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0091, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0121, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0087, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0140, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0093, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0131, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0124, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0085, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0090, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0107, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0090, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0082, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0082, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0091, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0180, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0127, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0146, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0091, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0084, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0085, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0096, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0135, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0109, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0089, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0125, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0108, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0121, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0090, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0130, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0137, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0098, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0085, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0086, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0106, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0105, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0082, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0105, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0191, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0122, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0021, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0146, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0082, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0098, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0115, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0109, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0123, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0123, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0135, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0104, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0090, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0090, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0155, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0085, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0085, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0120, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0111, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0094, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0103, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0017, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0130, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0107, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0135, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0117, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0080, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0111, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0124, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0108, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0117, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0121, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0140, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0091, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0106, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0183, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0122, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0149, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0103, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0119, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0212, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0093, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0090, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0111, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0126, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0111, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0084, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0086, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0098, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0160, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0116, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0083, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0139, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0127, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0153, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0103, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0080, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0083, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0108, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0085, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0095, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0103, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0083, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0089, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0120, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0090, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0106, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0154, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0091, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0110, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0117, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0097, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0081, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0087, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0124, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0129, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0157, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0114, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0117, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0101, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0084, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0101, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0112, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0082, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0125, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0085, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0081, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0088, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0101, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0080, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0135, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0101, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0089, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0126, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0080, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0081, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0080, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0087, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0119, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0132, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0081, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0090, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0093, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0101, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0089, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0174, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0020, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0082, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0114, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0088, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0163, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0088, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0153, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0106, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0102, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0088, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0138, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0094, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0089, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0096, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0121, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0099, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0090, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0101, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0098, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0133, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0114, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0085, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0115, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0121, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0087, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0136, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0140, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0085, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0087, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0141, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0104, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0111, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0099, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0176, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0137, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0186, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0125, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0103, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0084, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0101, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0113, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0104, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0090, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0173, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0112, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0112, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0090, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0099, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0160, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0185, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0094, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0087, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0134, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0129, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0106, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0112, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0108, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0141, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0093, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0129, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0090, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0092, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0117, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0093, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0081, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0123, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0113, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0108, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0082, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0108, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0109, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0129, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0106, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0097, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0147, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0085, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0015, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0088, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0081, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0082, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0105, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0089, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0090, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0023, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0018, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0213, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0187, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0231, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0104, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0080, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0118, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0090, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0111, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0132, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0110, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0104, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0024, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0097, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0147, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0213, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0105, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0096, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0131, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0081, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0159, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0103, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0081, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0098, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0161, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0113, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0126, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0099, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0080, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0114, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0082, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0114, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0114, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0106, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0133, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0136, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0139, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0088, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0083, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0115, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0095, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0082, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0135, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0101, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0141, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0089, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0080, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0135, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0080, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0115, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0100, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0081, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0090, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0090, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0122, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0104, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0106, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0104, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0095, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0093, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0081, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0103, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0175, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0084, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0116, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0107, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0147, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0085, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0112, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0125, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0120, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0119, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0080, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0133, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0164, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0091, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0088, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0096, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0133, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0105, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0095, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0121, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0100, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0157, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0087, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0090, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0084, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0114, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0092, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0093, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0187, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0187, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0116, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0111, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0109, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0094, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0095, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0114, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0093, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0112, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0096, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0117, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0085, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0116, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0114, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0111, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0091, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0094, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0248, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0129, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0102, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0023, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0146, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0114, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0108, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0083, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0104, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0118, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0121, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0104, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0093, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0114, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0086, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0232, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0137, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0099, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0101, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0113, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0125, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0088, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0095, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0123, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0157, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0102, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0113, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0085, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0091, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0118, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0134, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0088, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0080, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0082, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0140, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0154, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0111, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0106, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0110, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0083, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0114, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0104, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0083, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0092, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0105, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0131, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0104, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0115, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0097, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0097, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0111, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0093, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0090, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0101, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0099, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0090, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0102, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0134, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0112, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0178, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0081, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0095, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0137, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0085, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0136, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0109, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0091, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0112, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0154, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0118, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0100, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0091, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0024, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0095, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0090, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0088, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0159, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0149, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0152, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0088, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0108, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0110, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0089, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0116, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0094, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0128, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0153, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0084, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0086, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0110, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0083, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0081, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0086, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0101, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0086, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0091, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0195, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0108, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0105, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0113, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0109, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0142, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0098, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0095, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0089, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0108, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0100, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0101, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0094, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0110, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0109, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0109, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0131, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0080, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0126, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0085, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0112, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0184, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0119, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0162, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0123, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0159, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0122, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0111, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0109, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0207, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0122, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0094, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0095, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0080, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0108, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0157, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0096, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0130, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0081, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0083, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0083, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0103, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0165, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0130, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0147, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0137, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0142, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0179, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0122, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0165, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0151, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0099, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0099, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0118, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0083, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0114, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0086, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0120, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0105, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0087, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0106, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0154, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0094, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0088, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0098, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0132, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0132, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0158, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0180, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0098, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0080, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0133, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0139, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0186, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0107, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0094, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0096, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0115, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0169, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0106, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0112, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0117, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0129, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0083, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0085, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0101, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0088, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0131, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0113, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0146, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0080, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0138, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0092, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0084, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0097, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0091, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0112, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0092, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0147, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0116, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0120, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0096, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0104, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0092, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0088, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0107, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0178, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0021, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0090, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0088, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0093, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0094, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0108, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0089, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0082, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0099, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0108, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0082, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0196, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0096, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0088, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0107, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0131, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0082, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0094, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0089, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0096, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0095, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0149, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0080, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0106, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0131, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0100, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0112, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0017, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0124, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0020, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0096, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0084, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0083, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0095, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0179, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0139, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0080, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0096, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0097, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0122, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0140, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0119, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0107, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0128, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0129, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0099, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0117, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0106, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0091, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0112, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0134, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0131, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0132, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0081, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0108, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0089, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0124, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0154, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0108, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0087, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0129, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0135, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0140, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0127, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0086, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0086, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0103, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0139, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0081, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0124, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0195, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0109, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0088, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0137, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0152, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0109, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0080, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0104, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0110, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0112, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0107, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0124, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0090, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0093, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0098, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0088, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0083, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0087, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0151, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0156, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0142, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0125, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0160, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0102, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0175, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0103, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0087, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0146, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0081, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0139, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0121, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0109, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0102, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0088, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0082, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0082, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0110, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0096, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0092, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0150, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0090, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0160, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0089, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0132, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0106, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0102, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0149, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0157, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0109, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0151, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0126, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0087, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0184, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0130, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0118, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0138, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0105, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0104, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0123, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0152, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0082, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0105, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0090, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0092, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0104, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0143, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0111, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0111, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0097, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0096, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0118, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0083, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0103, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0111, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0088, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0086, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0149, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0095, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0080, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0021, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0121, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0106, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0111, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0080, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0087, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0118, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0175, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0112, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0089, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0082, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0131, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0090, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0148, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0092, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0097, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0081, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0102, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0081, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0106, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0140, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0113, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0127, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0091, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0084, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0140, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0104, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0161, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0097, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0131, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0162, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0082, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0116, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0125, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0102, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0082, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0122, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0082, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0122, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0089, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0157, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0091, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0080, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0118, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0080, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0128, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0112, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0022, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0163, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0093, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0086, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0143, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0108, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0122, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0168, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0089, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0117, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0103, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0081, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0101, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0140, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0085, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0157, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0120, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0108, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0122, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0117, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0121, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0134, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0130, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0086, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0094, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0111, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0102, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0113, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0093, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0135, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0097, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0148, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0120, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0174, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0091, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0110, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0083, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0096, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0092, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0140, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0104, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0081, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0089, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0161, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0081, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0018, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0080, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0083, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0087, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0132, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0153, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0098, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0100, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0081, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0083, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0162, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0081, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0130, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0149, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0080, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0109, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0136, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0175, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0235, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0124, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0087, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0160, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0184, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0123, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0101, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0081, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0165, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0100, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0101, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0150, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0089, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0185, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0107, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0169, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0106, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0098, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0165, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0021, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0090, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0115, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0093, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0152, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0108, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0108, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0131, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0147, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0108, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0087, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0081, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0082, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0088, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0094, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0083, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0101, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0102, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0019, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0118, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0141, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0094, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0085, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0094, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0136, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0110, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0024, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0083, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0088, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0097, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0082, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0081, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0109, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0097, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0110, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0024, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0092, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0092, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0091, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0114, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0140, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0114, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0126, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0111, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0100, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0093, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0104, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0147, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0134, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0087, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0087, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0112, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0023, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0082, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0112, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0087, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0096, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0092, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0154, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0104, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0083, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0138, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0114, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0089, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0101, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0024, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0167, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0082, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0086, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0092, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0101, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0151, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0081, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0154, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0126, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0104, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0082, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0203, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0081, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0092, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0135, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0080, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0120, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0017, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0100, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0102, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0085, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0100, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0142, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0122, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0104, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0112, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0106, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0100, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0089, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0123, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0089, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0108, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0099, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0128, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0104, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0160, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0020, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0119, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0137, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0081, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0222, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0131, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0117, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0144, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0148, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0020, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0083, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0167, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0089, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0084, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0112, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0094, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0101, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0101, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0087, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0023, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0013, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0084, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0086, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0136, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0086, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0099, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0106, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0091, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0121, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0082, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0106, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0128, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0136, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0081, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0132, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0133, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0087, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0094, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0097, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0087, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0096, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0165, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0082, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0114, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0086, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0103, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0100, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0090, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0020, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0087, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0095, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0118, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0093, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0085, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0102, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0087, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0120, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0089, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0082, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0102, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0080, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0130, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0114, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0115, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0135, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0146, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0108, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0082, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0121, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0083, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0085, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0135, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0092, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0125, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0080, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0102, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0164, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0082, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0088, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0087, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0117, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0141, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0130, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0104, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0080, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0168, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0151, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0110, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0015, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0101, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0124, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0087, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0129, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0187, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0087, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0091, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0105, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0080, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0129, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0203, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0092, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0148, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0102, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0095, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0103, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0182, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0101, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0085, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0137, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0129, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0083, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0086, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0094, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0092, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0146, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0121, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0110, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0083, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0017, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0097, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0092, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0100, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0104, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0023, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0095, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0114, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0114, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0116, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0212, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0091, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0083, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0108, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0099, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0082, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0114, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0091, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0100, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0103, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0089, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0094, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0091, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0108, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0101, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0094, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0144, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0122, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0093, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0147, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0099, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0120, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0092, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0134, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0166, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0095, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0097, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0080, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0136, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0167, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0119, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0137, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0148, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0134, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0095, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0112, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0090, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0080, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0203, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0091, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0090, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0115, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0150, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0102, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0103, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0107, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0096, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0161, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0083, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0135, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0157, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0131, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0117, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0103, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0109, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0095, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0124, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0132, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0101, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0109, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0103, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0118, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0092, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0169, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0085, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0106, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0080, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0125, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0082, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0099, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0194, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0098, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0198, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0135, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0107, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0112, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0090, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0104, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0140, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0118, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0170, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0106, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0091, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0116, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0165, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0145, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0132, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0099, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0133, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0122, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0097, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0109, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0114, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0089, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0132, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0116, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0120, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0127, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0090, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0109, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0110, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0086, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0088, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0149, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0116, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0108, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0123, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0099, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0119, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0086, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0099, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0099, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0099, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0095, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0149, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0106, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0094, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0094, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0117, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0144, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0100, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0099, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0107, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0109, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0161, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0168, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0163, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0176, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0091, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0119, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0024, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0080, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0124, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0158, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0110, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0129, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0132, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0085, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0091, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0141, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0098, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0083, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0182, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0086, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0149, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0096, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0123, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0110, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0082, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0099, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0135, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0121, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0128, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0116, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0110, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0091, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0084, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0100, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0122, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0089, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0081, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0135, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0265, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0122, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0133, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0101, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0082, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0137, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0100, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0092, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0197, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0103, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0082, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0086, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0124, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0138, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0094, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0104, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0089, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0180, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0081, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0021, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0024, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0083, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0128, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0123, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0162, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0145, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0125, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0102, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0080, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0108, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0109, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0081, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0096, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0159, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0097, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0167, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0101, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0139, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0128, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0102, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0083, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0081, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0102, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0130, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0089, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0106, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0111, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0087, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0180, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0096, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0107, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0095, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0126, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0129, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0104, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0107, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0113, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0100, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0121, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0089, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0107, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0083, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0107, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0088, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0081, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0119, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0087, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0143, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0090, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0144, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0112, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0100, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0120, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0143, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0108, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0088, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0127, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0024, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0086, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0152, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0102, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0150, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0100, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0120, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0098, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0082, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0093, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0092, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0095, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0102, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0131, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0100, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0143, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0088, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0083, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0101, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0132, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0087, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0124, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0110, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0153, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0114, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0101, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0111, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0130, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0081, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0109, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0129, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0095, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0113, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0186, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0089, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0132, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0114, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0117, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0126, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0083, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0137, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0117, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0097, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0111, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0089, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0221, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0097, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0149, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0082, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0118, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0143, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0105, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0123, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0096, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0104, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0088, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0095, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0115, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0088, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0093, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0140, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0113, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0100, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0129, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0112, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0087, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0099, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0090, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0114, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0132, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0098, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0087, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0096, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0104, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0129, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0132, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0093, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0146, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0123, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0121, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0131, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0084, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0111, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0081, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0111, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0085, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0150, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0111, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0090, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0114, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0081, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0094, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0191, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0181, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0119, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0095, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0121, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0085, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0102, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0102, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0113, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0111, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0128, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0151, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0216, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0157, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0080, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0108, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0082, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0094, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0093, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0089, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0019, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0090, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0021, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0085, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0141, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0097, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0083, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0080, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0134, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0143, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0093, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0135, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0091, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0080, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0093, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0092, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0018, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0088, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0080, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0099, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0086, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0112, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0089, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0133, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0133, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0152, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0093, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0127, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0083, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0110, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0147, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0082, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0121, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0092, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0122, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0106, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0083, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0091, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0145, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0093, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0111, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0107, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0099, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0169, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0143, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0108, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0116, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0024, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0125, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0015, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0195, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0171, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0104, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0151, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0144, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0083, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0081, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0099, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0130, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0116, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0122, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0107, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0166, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0111, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0132, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0092, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0246, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0118, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0137, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0119, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0083, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0137, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0095, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0137, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0101, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0095, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0108, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0124, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0121, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0023, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0127, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0108, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0106, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0125, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0094, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0086, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0183, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0080, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0100, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0087, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0118, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0141, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0099, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0108, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0083, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0120, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0086, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0150, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0142, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0092, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0080, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0130, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0101, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0093, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0116, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0093, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0091, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0017, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0101, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0088, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0170, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0122, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0100, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0162, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0100, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0105, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0177, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0152, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0166, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0118, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0085, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0084, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0157, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0120, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0091, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0091, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0093, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0118, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0110, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0130, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0147, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0094, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0100, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0178, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0151, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0124, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0132, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0134, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0093, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0101, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0087, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0088, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0105, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0112, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0121, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0211, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0139, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0085, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0097, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0091, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0083, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0138, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0090, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0113, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0102, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0015, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0090, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0110, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0099, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0110, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0124, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0138, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0152, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0137, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0081, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0107, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0091, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0119, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0085, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0089, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0127, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0149, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0170, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0100, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0112, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0119, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0138, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0098, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0091, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0104, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0097, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0021, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0085, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0014, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0116, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0111, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0114, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0125, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0115, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0088, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0130, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0106, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0113, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0117, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0082, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0120, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0084, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0098, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0080, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0094, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0083, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0166, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0089, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0111, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0148, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0179, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0116, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0116, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0100, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0100, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0118, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0095, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0082, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0138, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0088, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0180, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0097, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0137, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0121, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0086, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0084, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0098, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0080, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0132, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0164, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0089, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0115, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0089, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0089, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0122, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0169, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0089, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0081, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0100, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0104, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0098, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0090, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0083, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0084, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0082, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0094, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0126, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0129, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0090, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0012, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0122, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0134, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0093, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0104, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0135, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0180, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0090, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0102, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0088, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0153, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0115, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0101, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0101, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0103, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0132, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0113, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0138, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0085, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0115, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0092, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0130, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0123, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0133, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0132, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0091, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0023, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0141, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0142, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0107, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0112, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0102, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0087, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0122, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0099, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0081, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0102, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0085, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0090, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0112, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0159, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0103, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0119, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0080, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0161, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0167, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0143, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0132, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0118, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0103, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0097, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0080, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0176, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0086, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0190, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0136, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0117, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0085, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0087, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0121, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0091, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0140, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0090, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0092, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0147, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0160, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0111, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0096, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0099, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0090, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0128, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0097, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0111, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0170, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0183, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0119, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0087, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0085, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0096, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0157, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0114, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0081, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0089, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0080, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0093, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0128, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0082, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0096, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0082, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0144, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0116, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0108, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0119, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0115, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0109, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0093, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0136, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0138, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0141, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0110, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0108, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0103, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0151, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0103, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0094, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0021, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0088, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0122, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0080, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0132, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0131, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0136, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0082, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0099, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0022, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0107, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0114, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0081, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0080, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0083, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0082, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0108, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0177, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0110, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0090, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0022, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0119, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0099, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0112, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0157, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0104, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0103, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0112, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0093, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0090, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0101, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0122, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0087, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0096, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0099, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0095, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0094, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0152, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0092, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0127, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0172, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0110, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0164, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0101, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0113, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0081, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0117, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0086, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0096, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0129, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0115, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0160, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0142, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0110, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0080, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0139, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0141, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0161, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0095, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0146, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0093, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0119, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0087, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0098, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0119, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0139, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0090, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0108, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0118, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0092, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0148, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0120, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0108, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0138, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0082, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0084, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0087, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0082, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0113, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0082, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0148, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0100, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0148, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0122, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0083, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0120, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0081, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0101, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0084, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0095, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0101, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0166, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0121, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0082, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0090, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0091, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0088, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0087, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0084, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0106, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0109, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0128, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0080, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0118, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0093, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0112, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0083, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0097, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0105, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0141, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0024, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0098, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0092, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0113, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0102, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0020, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0102, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0115, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0097, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0084, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0134, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0098, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0018, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0098, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0094, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0133, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0018, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0196, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0116, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0143, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0096, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0082, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0199, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0186, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0092, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0132, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0102, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0097, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0122, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0080, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0094, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0099, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0106, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0092, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0095, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0093, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0138, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0083, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0135, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0170, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0082, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0170, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0121, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0082, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0083, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0096, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0097, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0124, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0081, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0111, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0083, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0085, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0097, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0105, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0096, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0155, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0105, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0105, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0101, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0117, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0097, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0117, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0104, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0086, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0115, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0101, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0126, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0103, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0080, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0122, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0085, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0021, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0099, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0081, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0123, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0093, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0081, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0158, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0093, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0122, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0109, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0121, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0080, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0092, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0101, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0009, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0101, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0093, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0106, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0092, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0087, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0112, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0087, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0081, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0098, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0107, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0087, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0164, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0186, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0106, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0101, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0091, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0103, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0125, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0082, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0113, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0137, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0122, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0082, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0097, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0098, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0088, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0092, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0093, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0088, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0103, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0158, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0081, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0142, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0115, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0099, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0110, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0117, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0117, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0092, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0113, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0112, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0099, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0091, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0087, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0108, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0134, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0095, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0088, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0120, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0116, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0112, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0105, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0131, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0107, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0024, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0082, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0158, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0094, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0087, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0105, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0024, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0105, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0136, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0104, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0116, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0103, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0114, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0147, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0106, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0126, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0155, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0124, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0080, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0135, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0098, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0094, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0104, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0133, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0156, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0096, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0097, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0130, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0172, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0082, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0145, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0116, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0018, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0163, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0128, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0109, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0080, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0162, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0118, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0156, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0089, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0080, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0116, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0100, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0102, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0093, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0107, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0086, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0085, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0093, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0153, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0129, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0081, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0097, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0107, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0108, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0106, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0102, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0107, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0098, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0094, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0140, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0093, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0086, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0080, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0118, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0139, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0097, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0173, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0094, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0095, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0168, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0089, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0081, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0116, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0081, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0133, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0105, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0092, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0096, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0103, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0110, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0160, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0243, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0085, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0090, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0113, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0088, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0098, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0165, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0176, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0089, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0103, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0024, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0102, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0095, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0099, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0121, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0083, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0129, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0122, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0127, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0097, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0105, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0081, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0150, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0082, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0092, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0092, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0114, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0084, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0106, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0136, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0088, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0129, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0097, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0102, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0088, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0086, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0096, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0115, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0090, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0124, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0088, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0094, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0155, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0088, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0089, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0135, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0092, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0106, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0115, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0082, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0125, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0126, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0135, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0123, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0121, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0082, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0095, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0134, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0101, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0129, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0082, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0097, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0082, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0174, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0140, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0113, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0113, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0101, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0084, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0089, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0116, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0117, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0161, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0105, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0102, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0098, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0104, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0085, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0090, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0096, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0094, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0103, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0116, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0115, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0020, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0097, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0144, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0086, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0082, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0105, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0088, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0093, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0108, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0096, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0084, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0141, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0096, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0018, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0019, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0111, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0084, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0086, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0114, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0095, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0107, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0137, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0096, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0099, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0086, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0114, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0090, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0100, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0091, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0147, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0100, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0125, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0210, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0105, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0092, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0097, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0086, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0115, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0101, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0086, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0081, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0115, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0085, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0101, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0126, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0200, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0137, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0132, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0080, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0083, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0114, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0091, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0103, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0106, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0083, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0141, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0017, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0164, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0086, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0103, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0121, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0134, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0083, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0093, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0083, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0115, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0096, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0120, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0177, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0090, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0086, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0089, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0102, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0105, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0081, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0125, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0080, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0129, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0111, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0092, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0084, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0080, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0085, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0092, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0116, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0089, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0178, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0140, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0095, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0095, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0097, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0119, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0118, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0106, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0121, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0108, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0080, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0171, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0099, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0022, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0154, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0081, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0013, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0121, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0096, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0091, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0106, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0103, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0096, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0097, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0143, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0141, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0117, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0120, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0101, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0102, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0120, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0158, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0124, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0125, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0115, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0082, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0110, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0096, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0010, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0120, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0093, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0121, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0017, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0131, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0130, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0013, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0144, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0086, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0083, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0104, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0112, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0126, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0127, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0115, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0104, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0113, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0088, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0130, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0081, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0111, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0085, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0118, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0114, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0129, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0130, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0136, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0104, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0087, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0105, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0086, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0085, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0094, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0080, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0101, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0086, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0120, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0082, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0088, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0152, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0091, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0142, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0114, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0146, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0113, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0170, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0089, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0091, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0107, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0109, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0094, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0128, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0080, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0097, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0095, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0116, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0085, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0112, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0092, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0125, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0096, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0104, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0082, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0107, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0127, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0108, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0164, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0089, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0127, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0129, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0149, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0093, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0121, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0110, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0106, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0095, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0091, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0082, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0009, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0099, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0117, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0153, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0114, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0080, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0097, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0087, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0120, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0082, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0144, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0087, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0103, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0022, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0016, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0112, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0094, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0097, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0091, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0123, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0141, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0128, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0089, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0124, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0092, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0084, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0098, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0098, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0089, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0115, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0132, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0112, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0089, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0126, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0119, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0146, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0135, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0109, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0105, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0123, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0085, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0112, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0134, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0096, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0115, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0083, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0081, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0114, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0082, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0088, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0086, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0106, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0119, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0087, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0099, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0111, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0096, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0158, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0022, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0131, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0118, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0094, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0103, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0123, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0105, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0105, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0087, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0083, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0090, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0088, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0090, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0146, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0098, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0083, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0080, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0083, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0112, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0174, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0112, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0173, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0087, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0111, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0124, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0019, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0094, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0093, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0021, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0081, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0093, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0134, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0094, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0112, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0088, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0085, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0089, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0119, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0108, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0091, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0143, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0007, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0007, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0007, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0007, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0007, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0007, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0007, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0007, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0007, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0007, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0006, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0006, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0006, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0006, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0006, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0006, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0006, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0006, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0009, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0082, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0151, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0105, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0113, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0131, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0080, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0086, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0023, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0089, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0120, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0088, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0110, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0114, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0088, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0091, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0136, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0122, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0080, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0098, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0114, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0085, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0091, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0086, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0096, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0114, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0096, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0092, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0082, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0142, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0109, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0096, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0082, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0113, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0120, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0082, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0092, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0090, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0107, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0105, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0110, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0113, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0099, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0016, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0108, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0101, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0021, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0131, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0119, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0085, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0109, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0098, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0024, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0101, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0101, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0125, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0091, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0108, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0096, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0125, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0098, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0087, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0085, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0089, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0082, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0097, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0107, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0103, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0093, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0081, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0123, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0088, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0085, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0111, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0091, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0103, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0102, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0120, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0085, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0136, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0086, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0122, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0175, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0099, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0129, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0115, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0104, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0192, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0101, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0020, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0083, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0087, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0103, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0101, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0085, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0123, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0116, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0087, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0092, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0117, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0017, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0084, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0097, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0100, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0174, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0087, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0147, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0176, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0119, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0088, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0126, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0095, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0023, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0097, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0089, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0094, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0209, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0134, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0113, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0015, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0105, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0102, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0087, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0080, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0110, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0102, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0219, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0157, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0097, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0098, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0081, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0099, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0096, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0018, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0104, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0120, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0084, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0084, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0080, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0099, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0135, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0081, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0086, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0081, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0084, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0084, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0155, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0083, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0140, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0086, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0153, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0114, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0093, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0120, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0096, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0102, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0134, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0081, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0093, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0108, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0022, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0089, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0153, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0098, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0085, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0098, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0092, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0098, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0084, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0093, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0090, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0103, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0082, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0093, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0146, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0098, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0117, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0087, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0089, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0097, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0095, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0141, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0110, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0084, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0094, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0080, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0114, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0090, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0097, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0143, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0089, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0082, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0023, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0135, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0107, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0085, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0186, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0097, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0091, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0111, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0144, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0170, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0104, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0116, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0096, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0015, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0111, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0099, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0102, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0090, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0103, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0108, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0105, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0098, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0125, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0081, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0112, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0098, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0099, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0094, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0084, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0083, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0105, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0172, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0118, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0106, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0081, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0129, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0085, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0107, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0089, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0124, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0093, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0102, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0014, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0088, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0014, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0138, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0084, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0083, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0103, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0091, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0129, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0082, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0088, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0092, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0135, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0017, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0119, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0080, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0098, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0138, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0104, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0110, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0080, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0129, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0103, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0103, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0177, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0081, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0124, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0094, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0093, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0117, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0111, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0098, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0090, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0198, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0106, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0145, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0109, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0099, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0175, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0011, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0101, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0156, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0138, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0114, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0107, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0086, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0217, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0099, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0108, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0083, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0111, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0104, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0093, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0092, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0080, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0101, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0097, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0115, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0086, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0101, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0119, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0112, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0091, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0088, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0117, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0107, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0100, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0098, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0133, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0147, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0129, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0095, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0111, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0092, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0128, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0126, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0083, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0080, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0092, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0123, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0011, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0098, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0103, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0014, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0088, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0019, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0101, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0091, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0099, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0104, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0084, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0104, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0117, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0083, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0082, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0122, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0174, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0099, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0107, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0091, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0153, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0110, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0102, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0097, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0085, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0094, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0085, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0087, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0127, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0100, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0085, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0084, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0080, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0023, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0129, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0085, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0083, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0087, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0088, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0015, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0137, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0091, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0126, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0090, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0091, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0099, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0083, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0098, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0134, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0085, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0090, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0012, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0096, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0114, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0146, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0106, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0094, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0082, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0089, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0110, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0086, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0115, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0103, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0087, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0122, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0082, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0089, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0096, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0134, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0111, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0098, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0109, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0102, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0098, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0081, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0092, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0091, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0096, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0160, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0017, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0082, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0109, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0014, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0115, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0080, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0093, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0082, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0112, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0090, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0138, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0102, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0116, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0118, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0024, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0119, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0019, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0111, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0119, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0084, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0098, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0098, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0117, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0090, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0085, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0090, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0084, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0144, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0143, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0096, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0099, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0125, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0090, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0095, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0018, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0018, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0085, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0083, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0091, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0176, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0091, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0012, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0022, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0141, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0116, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0120, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0090, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0087, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0100, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0105, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0094, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0114, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0095, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0023, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0101, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0098, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0081, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0126, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0096, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0100, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0127, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0099, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0110, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0107, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0118, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0109, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0018, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0178, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0122, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0097, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0111, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0147, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0085, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0091, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0146, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0127, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0084, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0084, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0128, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0129, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0182, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0086, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0119, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0021, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0021, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0086, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0019, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0082, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0093, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0089, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0113, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0123, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0139, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0096, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0102, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0114, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0082, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0126, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0147, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0126, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0139, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0099, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0118, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0015, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0090, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0116, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0085, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0117, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0104, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0131, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0109, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0089, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0123, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0129, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0086, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0096, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0089, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0097, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0110, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0081, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0096, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0097, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0089, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0117, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0139, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0080, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0081, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0100, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0082, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0091, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0114, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0105, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0093, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0113, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0125, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0090, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0177, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0155, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0080, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0092, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0133, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0127, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0099, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0084, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0081, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0083, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0129, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0113, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0102, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0100, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0098, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0088, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0136, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0122, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0115, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0085, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0090, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0107, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0152, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0022, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0094, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0115, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0083, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0090, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0106, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0109, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0216, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0140, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0123, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0113, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0083, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0138, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0107, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0024, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0086, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0097, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0108, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0097, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0177, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0106, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0113, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0117, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0094, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0134, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0017, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0131, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0132, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0116, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0085, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0103, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0091, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0083, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0080, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0100, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0081, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0108, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0159, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0107, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0102, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0087, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0138, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0146, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0113, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0095, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0089, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0082, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0086, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0099, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0087, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0126, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0103, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0082, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0105, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0125, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0099, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0102, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0113, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0088, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0018, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0084, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0119, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0086, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0090, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0089, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0101, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0124, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0081, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0087, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0135, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0131, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0141, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0167, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0133, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0124, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0099, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0152, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0081, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0083, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0113, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0094, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0138, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0123, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0089, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0172, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0183, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0111, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0091, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0093, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0094, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0100, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0080, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0090, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0111, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0095, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0101, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0086, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0113, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0168, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0021, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0104, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0108, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0022, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0106, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0098, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0081, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0080, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0105, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0092, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0111, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0080, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0080, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0087, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0219, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0085, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0080, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0096, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0090, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0093, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0092, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0123, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0084, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0100, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0084, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0101, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0087, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0081, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0100, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0082, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0021, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0100, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0118, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0083, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0178, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0093, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0092, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0084, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0156, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0146, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0171, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0021, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0081, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0080, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0095, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0111, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0224, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0120, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0118, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0134, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0108, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0102, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0215, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0114, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0119, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0126, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0081, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0202, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0137, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0080, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0086, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0174, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0094, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0099, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0023, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0143, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0101, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0137, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0131, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0105, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0081, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0085, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0084, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0123, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0120, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0094, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0168, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0127, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0080, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0095, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0093, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0138, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0138, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0125, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0087, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0024, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0015, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0101, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0105, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0091, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0090, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0143, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0081, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0139, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0147, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0096, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0111, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0080, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0083, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0113, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0082, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0097, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0086, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0104, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0127, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0118, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0133, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0125, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0095, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0116, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0134, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0136, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0168, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0123, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0089, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0134, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0081, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0089, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0149, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0127, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0206, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0155, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0140, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0152, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0093, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0130, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0024, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0094, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0018, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0118, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0137, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0106, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0093, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0114, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0085, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0090, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0173, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0090, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0118, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0091, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0160, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0127, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0140, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0218, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0114, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0194, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0161, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0097, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0141, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0102, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0024, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0081, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0100, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0090, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0111, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0116, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0104, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0116, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0121, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0115, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0080, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0013, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0011, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0123, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0098, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0149, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0124, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0107, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0109, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0020, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0102, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0087, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0122, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0123, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0083, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0111, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0091, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0081, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Saving model at iteration: 20000
Loss: tensor(0.0136, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0122, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0095, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0193, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0111, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0018, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0107, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0157, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0096, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0080, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0147, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0106, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0127, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0105, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0023, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0126, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0083, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0136, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0084, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0093, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0094, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0119, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0101, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0132, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0122, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0087, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0081, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0097, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0112, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0019, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0100, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0129, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0109, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0021, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0099, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0108, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0117, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0091, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0082, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0101, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0096, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0099, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0124, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0119, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0080, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0087, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0081, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0105, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0102, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0116, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0092, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0149, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0091, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0095, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0088, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0084, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0103, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0110, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0098, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0089, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0127, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0098, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0089, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0018, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0082, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0092, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0103, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0127, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0143, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0018, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0113, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0100, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0112, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0156, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0162, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0153, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0081, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0103, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0087, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0098, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0080, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0082, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0105, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0108, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0118, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0137, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0086, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0083, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0166, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0092, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0014, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0124, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0080, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0119, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0128, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0158, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0093, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0081, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0022, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0087, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0102, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0084, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0082, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0081, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0089, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0125, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0023, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0097, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0123, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0109, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0085, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0108, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0019, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0090, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0132, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0088, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0092, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0089, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0094, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0104, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0081, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0088, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0135, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0090, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0145, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0121, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0095, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0116, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0092, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0080, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0111, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0115, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0102, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0117, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0093, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0090, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0088, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0101, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0106, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0112, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0162, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0097, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0101, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0081, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0094, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0142, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0106, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0109, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0110, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0094, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0135, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0082, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0097, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0083, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0088, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0083, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0125, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0103, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0204, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0083, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0111, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0114, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0091, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0099, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0098, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0103, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0100, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0134, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0179, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0091, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0113, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0102, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0081, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0093, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0081, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0080, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0084, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0081, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0121, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0143, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0080, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0101, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0129, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0098, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0132, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0104, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0125, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0081, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0093, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0142, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0088, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0165, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0082, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0107, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0127, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0136, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0102, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0089, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0084, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0024, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0156, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0173, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0150, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0082, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0090, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0122, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0118, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0106, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0200, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0152, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0136, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0188, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0154, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0086, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0084, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0119, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0089, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0096, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0126, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0086, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0092, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0101, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0023, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0088, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0127, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0091, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0104, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0153, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0192, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0080, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0095, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0238, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0104, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0091, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0095, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0085, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0024, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0117, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0122, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0115, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0110, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0087, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0093, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0094, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0084, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0089, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0099, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0084, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0166, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0091, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0117, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0099, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0101, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0082, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0104, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0125, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0084, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0084, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0110, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0113, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0082, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0094, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0097, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0105, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0135, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0122, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0084, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0097, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0080, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0115, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0109, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0095, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0081, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0109, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0121, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0083, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0088, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0104, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0081, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0096, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0092, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0105, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0024, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0084, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0095, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0083, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0020, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0154, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0123, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0080, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0098, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0110, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0099, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0099, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0113, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0116, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0082, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0024, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0105, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0119, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0126, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0096, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0105, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0144, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0127, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0088, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0100, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0099, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0082, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0084, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0129, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0089, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0124, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0116, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0106, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0151, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0094, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0081, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0085, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0119, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0092, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0098, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0128, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0097, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0100, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0092, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0094, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0081, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0102, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0108, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0084, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0097, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0113, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0130, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0096, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0111, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0096, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0081, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0091, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0101, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0147, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0096, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0103, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0106, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0143, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0151, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0081, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0098, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0090, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0101, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0082, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0142, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0124, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0196, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0112, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0111, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0108, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0081, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0099, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0117, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0019, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0088, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0170, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0139, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0121, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0123, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0085, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0082, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0089, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0103, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0111, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0110, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0130, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0099, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0099, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0137, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0109, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0103, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0096, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0105, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0104, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0017, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0097, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0092, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0105, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0091, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0105, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0103, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0110, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0089, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0093, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0099, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0022, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0100, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0111, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0136, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0088, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0105, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0090, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0023, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0091, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0096, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0111, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0085, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0101, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0082, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0101, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0094, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0120, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0114, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0126, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0158, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0082, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0087, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0023, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0102, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0122, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0101, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0084, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0136, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0083, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0096, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0091, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0084, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0083, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0109, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0146, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0100, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0126, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0109, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0095, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0083, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0112, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0085, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0111, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0108, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0081, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0081, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0095, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0089, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0127, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0135, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0084, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0090, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0085, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0096, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0125, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0111, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0098, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0114, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0115, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0108, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0128, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0091, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0117, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0023, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0091, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0088, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0083, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0134, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0132, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0150, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0090, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0023, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0133, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0159, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0103, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0102, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0098, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0095, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0097, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0114, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0186, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0081, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0121, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0134, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0138, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0164, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0080, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0103, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0096, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0104, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0103, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0107, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0099, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0095, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0110, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0012, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0098, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0136, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0103, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0097, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0124, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0092, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0166, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0122, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0139, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0091, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0088, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0182, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0080, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0115, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0109, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0086, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0110, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0080, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0080, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0133, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0112, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0096, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0097, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0096, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0142, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0024, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0131, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0115, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0152, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0119, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0154, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0103, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0145, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0092, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0111, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0120, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0095, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0093, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0093, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0080, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0122, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0111, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0081, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0132, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0111, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0099, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0084, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0019, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0020, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0080, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0131, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0162, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0083, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0084, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0100, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0100, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0102, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0010, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0101, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0098, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0163, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0093, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0104, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0019, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0102, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0081, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0103, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0112, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0146, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0094, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0114, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0181, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0089, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0014, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0126, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0085, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0121, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0093, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0135, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0184, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0082, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0119, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0127, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0162, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0122, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0085, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0122, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0017, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0108, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0129, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0100, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0083, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0108, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0157, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0171, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0087, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0107, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0206, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0109, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0087, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0123, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0159, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0141, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0142, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0087, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0093, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0106, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0139, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0023, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0012, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0084, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0085, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0082, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0163, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0085, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0116, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0134, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0097, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0085, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0112, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0083, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0147, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0146, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0098, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0020, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0122, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0104, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0141, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0113, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0082, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0089, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0137, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0092, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0145, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0105, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0080, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0156, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0151, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0081, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0150, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0124, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0099, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0008, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0096, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0100, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0113, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0083, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0102, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0017, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0088, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0121, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0088, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0102, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0012, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0111, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0007, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0080, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0110, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0163, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0134, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0166, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0098, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0098, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0095, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0117, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0084, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0130, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0167, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0115, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0135, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0101, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0093, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0097, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0130, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0101, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0016, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0089, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0107, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0121, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0113, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0091, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0090, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0094, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0082, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0102, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0164, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0108, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0110, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0121, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0085, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0140, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0087, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0086, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0087, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0131, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0086, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0082, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0148, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0119, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0085, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0094, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0106, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0096, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0099, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0103, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0099, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0085, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0159, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0091, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0137, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0085, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0084, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0131, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0148, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0085, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0099, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0125, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0157, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0120, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0103, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0122, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0107, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0013, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0110, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0080, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0024, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0123, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0096, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0136, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0082, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0115, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0125, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0093, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0105, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0024, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0084, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0157, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0135, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0024, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0110, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0084, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0101, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0087, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0163, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0121, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0110, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0097, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0123, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0114, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0101, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0024, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0098, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0088, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0148, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0090, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0173, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0003, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0024, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0096, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0129, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0094, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0172, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0086, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0096, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0085, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0105, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0087, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0080, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0125, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0094, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0144, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0080, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0081, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0092, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0103, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0091, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0123, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0092, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0121, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0021, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0106, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0106, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0116, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0112, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0098, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0098, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0137, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0128, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0147, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0023, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0024, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0121, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0110, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0100, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0100, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0023, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0157, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0080, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0159, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0084, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0087, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0099, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0103, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0096, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0165, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0089, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0096, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0087, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0086, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0095, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0147, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0104, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0094, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0080, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0095, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0091, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0081, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0092, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0136, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0087, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0091, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0019, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0103, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0096, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0088, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0087, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0082, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0087, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0107, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0013, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0086, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0089, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0086, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0100, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0083, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0150, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0018, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0127, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0115, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0092, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0124, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0104, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0120, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0099, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0117, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0111, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0119, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0095, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0094, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0103, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0010, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0080, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0105, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0084, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0118, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0126, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0119, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0103, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0137, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0083, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0014, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0093, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0113, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0088, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0152, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0136, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0092, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0123, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0100, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0124, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0086, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0083, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0024, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0092, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0121, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0100, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0142, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0081, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0102, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0110, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0094, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0112, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0095, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0127, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0104, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0134, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0119, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0089, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0098, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0112, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0105, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0085, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0096, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0097, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0125, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0160, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0084, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0121, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0103, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0117, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0101, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0080, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0131, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0102, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0098, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0093, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0018, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0098, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0094, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0113, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0213, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0105, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0108, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0101, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0083, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0080, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0100, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0131, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0083, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0022, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0086, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0083, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0097, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0120, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0080, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0108, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0095, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0100, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0134, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0146, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0144, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0116, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0019, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0101, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0022, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0082, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0090, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0098, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0015, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0146, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0151, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0173, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0082, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0100, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0086, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0080, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0103, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0156, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0122, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0018, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0117, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0138, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0116, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0107, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0083, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0115, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0088, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0131, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0121, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0122, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0090, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0086, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0091, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0090, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0012, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0010, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0084, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0136, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0118, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0142, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0090, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0095, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0093, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0080, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0105, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0118, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0120, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0131, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0101, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0107, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0132, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0085, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0103, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0091, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0081, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0097, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0153, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0125, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0113, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0090, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0087, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0089, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0081, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0200, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0104, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0178, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0105, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0143, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0100, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0090, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0127, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0143, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0117, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0098, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0088, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0082, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0128, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0163, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0120, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0092, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0166, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0100, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0101, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0106, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0133, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0103, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0084, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0103, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0090, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0082, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0149, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0111, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0192, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0232, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0132, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0104, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0084, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0082, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0096, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0081, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0019, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0083, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0016, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0110, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0122, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0121, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0097, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0085, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0094, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0095, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0089, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0100, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0165, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0108, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0080, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0020, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0094, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0114, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0153, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0086, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0151, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0143, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0116, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0088, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0085, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0081, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0097, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0117, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0082, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0095, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0090, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0082, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0122, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0104, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0102, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0094, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0134, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0121, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0115, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0129, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0097, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0082, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0125, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0093, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0085, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0082, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0109, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0097, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0108, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0095, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0132, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0094, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0119, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0086, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0112, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0095, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0088, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0100, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0093, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0024, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0086, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0093, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0177, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0092, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0107, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0085, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0123, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0095, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0094, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0119, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0085, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0092, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0092, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0193, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0085, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0103, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0103, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0095, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0143, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0119, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0092, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0100, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0167, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0117, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0099, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0081, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0101, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0088, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0097, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0103, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0092, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0080, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0098, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0090, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0108, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0112, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0106, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0114, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0106, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0080, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0113, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0107, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0114, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0120, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0085, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0172, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0133, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0082, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0092, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0140, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0080, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0083, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0161, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0158, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0115, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0092, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0097, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0136, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0091, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0101, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0083, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0097, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0120, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0198, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0104, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0133, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0086, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0106, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0086, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0138, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0094, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0092, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0121, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0081, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0086, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0087, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0095, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0120, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0166, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0108, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0131, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0119, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0111, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0091, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0101, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0080, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0206, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0089, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0099, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0082, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0084, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0082, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0131, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0098, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0090, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0094, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0084, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0105, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0095, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0093, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0021, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0139, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0021, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0146, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0096, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0086, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0101, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0086, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0088, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0097, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0090, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0105, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0083, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0084, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0103, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0080, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0102, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0080, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0082, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0118, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0109, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0128, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0102, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0154, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0093, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0117, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0104, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0024, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0085, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0087, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0141, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0117, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0105, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0120, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0117, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0098, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0090, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0112, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0155, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0080, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0135, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0102, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0083, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0111, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0138, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0120, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0096, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0167, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0099, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0092, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0109, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0117, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0163, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0114, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0151, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0101, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0086, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0082, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0109, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0116, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0106, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0097, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0127, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0121, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0094, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0132, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0107, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0141, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0127, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0080, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0022, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0090, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0084, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0101, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0156, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0156, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0100, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0089, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0095, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0128, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0157, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0108, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0098, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0086, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0114, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0131, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0098, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0107, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0087, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0114, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0132, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0111, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0192, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0087, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0108, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0107, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0020, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0113, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0083, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0148, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0087, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0114, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0080, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0096, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0103, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0099, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0096, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0129, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0086, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0117, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0105, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0089, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0128, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0112, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0160, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0083, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0125, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0117, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0080, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0106, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0101, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0096, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0101, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0090, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0106, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0114, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0081, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0090, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0083, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0152, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0096, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0082, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0120, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0097, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0081, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0092, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0149, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0110, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0164, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0176, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0121, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0121, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0089, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0087, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0144, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0133, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0115, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0096, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0126, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0088, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0092, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0085, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0082, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0116, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0103, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0110, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0112, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0091, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0111, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0080, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0083, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0128, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0095, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0124, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0189, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0132, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0093, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0081, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0093, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0115, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0107, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0088, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0110, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0111, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0119, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0096, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0084, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0099, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0096, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0105, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0093, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0117, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0089, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0108, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0139, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0106, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0105, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0081, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0088, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0081, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0159, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0098, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0106, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0102, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0087, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0136, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0102, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0112, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0093, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0083, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0080, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0105, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0080, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0173, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0015, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0186, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0107, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0139, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0104, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0089, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0140, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0092, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0117, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0022, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0093, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0100, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0102, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0133, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0082, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0115, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0113, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0127, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0120, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0091, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0089, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0144, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0171, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0098, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0103, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0136, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0084, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0134, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0096, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0081, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0156, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0149, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0123, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0121, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0020, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0144, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0086, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0157, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0112, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0096, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0087, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0134, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0094, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0114, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0179, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0127, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0126, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0082, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0104, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0121, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0095, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0111, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0087, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0175, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0117, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0022, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0096, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0139, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0101, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0163, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0099, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0092, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0141, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0089, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0143, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0090, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0091, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0120, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0086, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0172, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0104, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0085, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0153, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0106, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0121, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0096, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0107, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0092, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0128, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0111, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0154, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0105, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0085, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0176, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0113, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0098, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0140, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0138, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0112, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0022, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0158, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0091, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0085, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0091, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0147, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0137, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0099, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0100, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0097, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0155, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0089, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0087, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0106, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0135, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0099, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0101, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0100, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0096, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0098, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0094, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0136, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0099, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0140, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0134, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0099, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0080, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0083, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0153, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0081, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0100, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0187, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0114, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0104, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0110, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0160, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0111, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0122, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0122, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0016, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0153, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0104, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0087, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0129, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0093, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0098, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0087, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0113, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0107, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0110, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0097, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0089, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0101, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0129, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0125, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0100, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0023, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0098, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0103, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0131, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0019, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0179, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0085, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0103, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0081, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0098, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0024, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0097, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0155, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0099, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0086, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0083, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0091, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0161, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0114, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0080, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0129, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0094, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0087, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0090, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0082, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0084, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0093, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0082, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0150, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0097, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0090, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0116, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0084, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0158, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0082, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0084, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0108, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0166, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0083, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0093, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0130, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0132, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0102, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0096, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0104, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0082, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0088, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0087, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0099, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0116, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0080, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0111, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0094, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0090, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0111, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0022, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0102, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0092, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0233, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0152, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0130, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0084, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0024, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0101, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0087, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0093, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0116, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0127, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0109, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0106, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0087, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0023, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0096, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0136, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0093, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0197, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0089, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0089, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0024, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0103, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0083, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0128, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0097, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0130, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0108, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0097, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0083, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0107, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0110, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0111, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0121, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0140, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0111, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0114, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0143, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0097, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0141, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0129, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0102, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0021, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0094, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0098, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0111, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0085, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0101, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0094, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0112, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0089, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0089, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0122, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0111, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0092, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0087, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0099, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0123, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0081, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0156, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0090, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0087, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0106, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0080, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0137, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0090, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0133, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0124, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0109, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0114, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0084, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0132, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0080, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0102, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0086, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0094, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0120, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0132, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0108, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0086, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0094, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0141, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0123, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0084, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0080, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0108, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0150, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0088, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0111, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0091, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0021, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0085, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0125, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0018, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0125, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0081, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0141, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0195, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0105, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0133, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0143, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0083, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0098, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0090, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0086, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0100, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0117, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0082, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0087, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0130, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0192, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0118, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0127, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0150, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0115, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0089, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0113, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0022, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0101, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0085, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0145, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0144, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0130, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0155, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0097, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0110, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0121, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0106, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0111, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0104, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0130, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0083, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0183, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0093, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0131, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0088, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0103, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0103, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0192, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0084, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0111, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0221, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0137, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0116, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0115, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0134, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0102, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0121, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0141, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0085, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0168, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0082, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0123, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0148, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0148, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0118, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0096, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0121, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0126, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0091, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0081, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0089, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0118, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0101, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0096, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0128, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0084, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0089, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0144, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0136, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0108, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0022, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0109, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0179, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0092, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0135, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0124, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0081, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0141, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0081, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0099, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0088, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0111, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0118, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0080, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0106, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0117, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0087, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0090, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0089, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0095, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0082, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0084, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0104, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0120, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0082, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0084, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0090, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0083, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0084, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0165, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0095, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0109, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0134, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0110, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0083, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0082, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0112, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0086, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0095, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0099, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0098, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0085, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0095, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0108, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0143, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0164, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0086, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0102, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0088, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0105, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0098, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0090, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0085, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0157, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0153, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0080, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0090, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0092, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0090, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0126, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0092, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0095, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0107, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0134, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0094, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0115, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0085, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0018, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0107, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0135, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0143, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0145, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0101, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0112, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0085, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0139, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0102, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0158, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0081, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0094, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0091, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0146, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0142, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0119, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0118, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0114, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0018, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0123, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0080, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0100, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0083, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0112, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0101, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0137, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0096, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0109, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0127, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0120, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0085, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0081, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0092, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0085, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0088, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0104, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0088, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0128, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0113, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0099, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0145, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0090, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0083, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0112, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0116, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0110, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0130, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0111, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0112, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0161, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0122, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0080, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0114, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0103, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0122, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0088, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0143, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0102, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0129, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0020, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0089, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0126, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0101, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0082, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0084, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0102, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0167, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0118, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0113, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0112, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0094, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0081, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0087, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0107, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0098, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0158, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0080, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0103, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0021, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0098, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0129, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0094, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0166, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0100, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0130, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0085, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0134, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0102, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0124, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0095, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0092, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0102, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0121, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0137, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0097, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0093, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0126, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0091, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0083, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0107, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0131, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0094, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0124, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0091, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0140, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0106, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0093, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0110, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0139, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0088, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0154, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0125, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0104, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0112, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0108, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0145, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0082, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0085, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0106, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0123, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0097, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0094, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0081, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0024, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0083, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0151, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0089, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0084, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0093, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0085, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0115, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0131, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0097, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0135, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0110, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0119, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0129, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0115, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0023, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0107, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0103, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0102, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0098, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0101, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0103, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0084, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0080, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0124, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0113, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0139, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0121, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0108, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0112, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0114, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0090, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0099, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0097, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0082, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0159, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0086, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0124, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0106, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0020, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0086, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0126, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0113, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0119, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0104, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0127, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0096, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0109, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0098, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0119, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0099, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0081, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0110, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0131, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0160, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0104, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0096, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0188, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0179, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0080, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0160, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0132, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0112, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0122, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0091, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0103, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0101, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0144, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0139, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0184, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0185, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0136, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0117, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0108, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0108, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0091, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0082, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0090, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0111, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0125, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0120, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0101, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0084, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0085, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0131, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0106, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0121, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0096, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0080, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0134, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0116, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0095, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0100, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0087, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0115, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0094, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0098, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0166, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0136, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0106, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0130, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0093, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0096, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0080, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0095, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0155, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0102, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0080, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0100, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0103, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0083, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0086, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0136, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0162, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0119, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0135, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0092, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0105, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0096, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0105, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0128, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0088, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0100, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0134, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0117, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0124, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0128, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0097, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0116, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0102, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0101, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0083, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0088, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0089, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0110, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0195, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0081, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0126, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0092, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0127, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0098, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0023, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0156, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0088, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0131, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0021, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0016, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0088, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0140, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0099, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0082, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0090, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0018, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0102, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0084, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0137, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0023, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0125, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0088, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0109, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0080, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0091, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0105, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0096, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0105, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0136, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0109, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0100, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0130, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0080, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0108, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0012, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0080, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0156, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0084, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0126, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0108, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0100, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0087, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0085, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0093, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0086, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0136, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0084, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0125, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0114, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0089, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0090, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0090, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0095, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0117, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0141, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0093, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0086, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0087, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0085, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0022, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0126, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0103, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0108, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0115, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0084, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0098, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0018, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0121, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0087, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0087, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0129, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0137, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0082, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0132, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0089, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0084, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0096, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0097, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0097, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0093, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0088, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0098, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0083, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0151, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0024, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0023, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0021, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0116, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0082, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0092, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0082, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0128, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0086, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0096, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0085, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0191, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0017, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0093, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0105, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0124, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0101, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0018, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0084, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0110, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0021, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0082, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0023, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0021, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0096, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0094, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0086, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0085, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0144, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0127, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0116, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0095, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0096, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0102, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0088, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0100, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0024, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0089, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0096, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0086, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0082, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0009, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0082, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0023, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0087, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0130, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0020, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0089, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0083, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0021, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0083, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0086, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0080, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0137, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0091, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0126, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0109, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0109, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0112, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0111, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0084, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0101, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0101, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0107, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0024, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0097, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0083, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0095, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0113, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0120, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0082, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0020, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0082, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0094, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0092, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0017, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0080, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0088, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0137, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0083, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0084, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0091, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0112, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0122, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0083, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0084, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0085, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0086, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0093, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0086, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0019, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0089, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0014, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0021, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0084, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0142, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0146, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0104, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0133, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0093, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0096, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0086, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0023, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0020, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0088, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0095, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0082, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0086, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0019, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0085, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0095, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0122, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0022, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0017, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0080, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0023, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0087, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0115, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0085, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0097, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0101, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0114, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0021, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0087, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0152, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0134, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0082, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0095, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0099, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0120, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0081, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0097, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0120, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0102, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0097, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0090, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0096, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0099, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0022, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0110, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0142, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0021, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0106, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0092, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0113, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0090, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0101, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0080, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0083, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0084, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0092, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0087, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0177, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0109, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0084, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0116, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0082, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0096, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0084, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0092, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0091, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0016, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0100, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0085, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0091, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0080, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0091, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0082, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0024, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0015, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0151, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0013, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0022, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0132, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0085, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0024, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0112, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0148, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0091, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0114, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0024, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0081, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0106, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0100, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0085, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0024, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0091, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0090, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0094, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0098, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0127, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0012, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0084, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0106, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0107, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0123, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0095, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0096, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0108, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0100, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0023, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0022, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0088, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0142, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0160, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0093, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0089, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0097, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0095, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0086, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0023, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0081, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0118, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0105, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0021, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0138, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0081, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0089, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0111, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0093, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0117, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0092, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0103, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0023, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0170, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0019, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0097, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0123, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0105, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0088, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0083, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0115, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0118, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0100, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0089, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0105, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0127, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0097, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0084, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0121, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0134, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0083, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0103, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0119, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0082, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0117, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0082, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0105, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0095, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0127, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0089, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0021, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0100, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0083, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0087, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0094, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0095, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0090, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0133, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0021, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0088, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0087, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0083, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0131, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0144, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0122, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0098, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0095, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0022, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0102, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0092, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0135, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0114, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0109, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0094, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0139, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0099, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0126, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0118, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0127, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0012, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0112, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0118, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0083, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0083, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0110, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0095, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0097, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0015, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0088, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0119, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0019, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0017, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0193, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0024, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0119, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0084, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0095, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0109, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0097, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0094, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0086, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0094, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0080, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0022, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0092, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0147, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0139, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0103, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0021, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0090, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0095, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0141, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0101, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0098, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0095, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0102, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0109, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0082, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0105, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0082, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0119, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0111, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0086, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0123, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0090, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0089, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0130, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0121, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0091, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0018, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0112, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0089, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0120, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0092, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0114, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0093, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0086, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0110, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0103, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0021, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0086, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0135, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0085, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0080, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0093, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0081, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0100, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0102, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0082, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0082, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0129, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0094, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0103, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0093, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0093, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0088, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0093, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0090, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0023, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0125, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0091, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0101, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0083, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0115, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0249, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0085, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0091, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0099, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0103, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0105, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0084, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0092, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0083, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0117, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0096, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0087, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0019, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0110, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0090, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0089, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0158, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0080, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0110, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0109, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0083, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0139, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0104, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0080, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0082, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0135, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0085, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0086, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0086, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0094, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0082, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0123, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0085, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0122, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0085, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0102, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0127, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0103, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0095, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0089, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0154, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0165, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0097, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0085, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0081, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0082, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0107, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0116, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0112, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0087, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0087, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0095, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0088, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0108, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0094, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0113, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0095, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0081, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0116, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0106, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0110, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0104, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0086, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0081, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0100, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0106, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0129, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0104, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0086, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0105, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0119, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0090, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
--------------EPOCH 5-------------
Test Accuracy: tensor(0.9564, device='cuda:0')
Loss: tensor(0.0147, device='cuda:0', grad_fn=<NllLossBackward>)
Saving model at iteration: 0
Loss: tensor(0.0093, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0085, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0128, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0130, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0227, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0092, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0173, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0219, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0090, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0126, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0101, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0146, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0100, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0155, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0118, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0155, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0152, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0157, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0146, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0202, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0104, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0195, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0106, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0119, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0210, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0171, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0129, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0127, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0137, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0086, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0165, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0208, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0140, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0229, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0081, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0115, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0183, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0149, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0206, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0134, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0201, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0088, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0087, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0242, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0146, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0213, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0099, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0212, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0023, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0148, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0104, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0232, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0159, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0111, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0135, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0024, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0143, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0092, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0137, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0179, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0112, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0008, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0132, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0008, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0008, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0122, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0246, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0132, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0115, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0176, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0138, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0007, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0175, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0112, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0157, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0007, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0088, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0157, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0021, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0111, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0169, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0137, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0150, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0020, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0006, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0006, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0020, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0138, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0080, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0107, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0119, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0020, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0020, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0104, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0123, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0006, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0014, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0006, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0018, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0100, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0101, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0089, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0005, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0134, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0266, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0083, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0086, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0085, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0160, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0146, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0016, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0104, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0115, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0023, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0005, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0005, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0113, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0086, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0170, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0204, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0023, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0015, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0005, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0005, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0005, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0005, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0089, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0167, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0172, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0086, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0140, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0098, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0005, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0124, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0014, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0091, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0005, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0217, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0005, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0005, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0013, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0116, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0013, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0138, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0176, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0204, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0106, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0122, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0224, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0216, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0181, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0107, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0151, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0117, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0088, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0022, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0021, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0004, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0254, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0128, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0125, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0131, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0004, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0168, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0212, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0173, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0099, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0107, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0102, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0098, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0147, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0116, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0129, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0161, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0182, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0104, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0163, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0090, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0180, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0113, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0129, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0115, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0136, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0164, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0102, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0146, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0088, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0088, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0098, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0082, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0137, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0125, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0166, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0115, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0125, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0152, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0236, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0087, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0196, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0121, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0086, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0106, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0132, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0203, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0106, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0098, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0083, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0020, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0191, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0137, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0158, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0164, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0117, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0136, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0221, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0118, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0019, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0161, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0132, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0124, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0097, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0084, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0141, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0096, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0213, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0024, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0175, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0132, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0115, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0109, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0172, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0121, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0149, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0134, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0110, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0108, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0099, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0200, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0131, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0130, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0191, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0097, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0174, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0237, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0105, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0121, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0189, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0183, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0175, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0104, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0098, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0114, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0106, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0090, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0163, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0018, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0110, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0080, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0198, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0131, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0144, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0153, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0179, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0181, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0112, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0080, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0103, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0091, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0088, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0134, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0138, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0195, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0091, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0103, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0207, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0106, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0112, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0165, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0022, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0123, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0116, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0112, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0105, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0082, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0160, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0181, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0125, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0102, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0092, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0191, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0170, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0083, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0161, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0109, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0230, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0084, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0199, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0146, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0121, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0097, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0101, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0128, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0214, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0120, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0142, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0185, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0105, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0091, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0129, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0111, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0138, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0196, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0164, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0092, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0142, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0239, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0107, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0129, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0219, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0222, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0101, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0093, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0117, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0099, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0110, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0107, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0123, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0129, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0085, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0009, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0086, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0109, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0093, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0093, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0090, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0080, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0152, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0087, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0102, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0102, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0085, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0154, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0140, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0146, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0087, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0091, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0092, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0158, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0141, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0083, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0098, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0085, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0021, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0156, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0114, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0103, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0140, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0149, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0099, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0133, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0112, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0143, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0096, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0085, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0097, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0081, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0109, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0123, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0138, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0081, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0123, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0108, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0086, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0185, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0251, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0145, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0094, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0135, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0139, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0005, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0107, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0019, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0080, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0109, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0137, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0089, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0110, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0156, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0162, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0183, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0016, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0022, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0132, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0146, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0106, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0151, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0019, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0162, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0193, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0117, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0141, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0180, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0110, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0144, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0104, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0108, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0110, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0121, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0006, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0102, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0169, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0145, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0157, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0117, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0020, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0101, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0154, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0109, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0139, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0007, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0102, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0119, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0116, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0093, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0096, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0110, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0107, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0200, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0137, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0124, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0013, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0085, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0141, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0115, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0138, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0192, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0122, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0153, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0145, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0122, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0117, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0105, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0085, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0138, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0164, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0183, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0019, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0149, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0125, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0119, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0125, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0258, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0081, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0081, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0009, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0124, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0086, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0083, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0090, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0105, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0021, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0085, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0118, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0144, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0120, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0133, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0082, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0092, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0141, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0119, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0014, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0162, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0081, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0150, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0153, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0145, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0114, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0132, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0169, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0103, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0164, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0109, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0129, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0162, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0115, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0014, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0212, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0221, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0157, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0095, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0084, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0103, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0115, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0123, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0122, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0087, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0240, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0153, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0115, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0138, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0109, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0176, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0137, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0110, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0156, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0153, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0098, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0214, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0101, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0125, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0151, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0090, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0086, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0218, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0114, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0120, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0095, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0082, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0108, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0086, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0086, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0189, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0237, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0086, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0151, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0117, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0178, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0081, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0012, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0091, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0145, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0088, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0150, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0087, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0121, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0090, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0092, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0186, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0023, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0200, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0232, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0109, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0086, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0118, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0083, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0022, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0133, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0097, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0017, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0108, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0012, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0161, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0119, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0125, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0127, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0092, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0017, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0146, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0288, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0098, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0095, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0095, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0171, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0088, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0112, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0020, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0169, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0147, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0110, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0128, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0009, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0122, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0160, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0162, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0124, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0005, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0083, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0119, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0116, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0108, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0211, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0113, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0082, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0109, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0151, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0110, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0147, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0022, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0141, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0153, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0098, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0135, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0137, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0119, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0082, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0110, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0104, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0108, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0147, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0200, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0132, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0117, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0129, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0085, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0087, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0092, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0024, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0154, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0102, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0129, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0131, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0140, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0166, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0142, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0146, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0023, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0200, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0171, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0094, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0140, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0144, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0088, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0106, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0107, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0093, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0110, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0124, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0084, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0212, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0085, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0141, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0090, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0092, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0105, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0005, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0200, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0113, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0130, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0122, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0179, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0166, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0158, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0164, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0017, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0135, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0190, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0184, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0147, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0121, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0124, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0086, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0090, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0193, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0091, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0146, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0093, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0099, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0106, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0186, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0166, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0161, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0134, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0132, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0144, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0144, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0157, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0088, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0095, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0182, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0111, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0096, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0119, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0084, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0230, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0141, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0127, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0086, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0105, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0084, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0120, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0144, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0158, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0089, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0113, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0150, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0109, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0115, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0103, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0010, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0136, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0119, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0195, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0138, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0153, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0107, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0096, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0125, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0098, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0108, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0123, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0086, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0182, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0129, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0168, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0199, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0084, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0090, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0082, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0088, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0195, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0126, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0107, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0168, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0138, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0096, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0200, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0120, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0110, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0145, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0153, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0090, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0099, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0088, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0093, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0087, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0085, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0157, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0022, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0132, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0136, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0023, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0172, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0167, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0107, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0180, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0095, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0155, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0144, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0093, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0153, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0093, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0226, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0178, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0153, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0091, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0012, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0106, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0101, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0143, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0147, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0158, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0138, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0092, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0182, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0227, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0009, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0152, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0090, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0095, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0086, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0227, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0143, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0011, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0016, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0095, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0085, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0112, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0110, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0127, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0096, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0023, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0114, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0081, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0132, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0176, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0113, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0193, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0115, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0177, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0117, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0081, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0011, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0010, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0172, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0084, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0112, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0115, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0156, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0114, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0020, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0174, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0151, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0090, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0151, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0016, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0118, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0101, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0128, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0173, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0128, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0081, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0113, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0172, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0170, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0129, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0119, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0205, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0142, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0082, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0087, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0080, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0139, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0106, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0010, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0194, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0126, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0080, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0393, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0136, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0119, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0136, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0015, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0117, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0099, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0003, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0127, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0120, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0096, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0136, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0147, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0155, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0112, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0087, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0144, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0080, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0108, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0004, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0164, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0090, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0121, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0121, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0010, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0102, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0089, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0082, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0098, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0193, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0113, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0084, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0150, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0023, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0108, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0012, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0146, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0090, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0124, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0085, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0160, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0085, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0081, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0115, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0215, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0174, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0172, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0192, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0119, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0165, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0020, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0163, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0097, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0171, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0093, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0123, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0181, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0088, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0102, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0151, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0114, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0159, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0127, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0098, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0163, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0253, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0103, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0129, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0122, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0135, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0117, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0160, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0202, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0133, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0150, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0192, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0128, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0163, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0137, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0159, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0085, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0115, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0180, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0080, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0121, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0012, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0117, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0107, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0134, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0089, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0196, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0102, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0023, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0012, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0116, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0103, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0220, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0081, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0015, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0090, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0095, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0129, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0088, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0151, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0175, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0127, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0136, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0011, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0013, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0108, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0102, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0083, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0130, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0088, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0094, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0096, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0134, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0142, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0088, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0187, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0121, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0238, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0099, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0105, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0170, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0122, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0094, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0130, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0103, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0139, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0170, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0176, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0152, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0180, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0111, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0018, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0099, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0093, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0175, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0085, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0106, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0009, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0154, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0137, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0173, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0012, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0015, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0084, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0133, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0082, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0145, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0118, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0119, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0186, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0101, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0199, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0092, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0086, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0168, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0003, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0103, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0147, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0251, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0139, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0162, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0121, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0106, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0127, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0097, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0024, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0086, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0098, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0080, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0182, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0196, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0123, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0138, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0087, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0123, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0012, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0093, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0134, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0095, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0149, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0168, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0106, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0097, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0089, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0144, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0009, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0003, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0128, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0164, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0127, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0165, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0094, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0111, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0103, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0145, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0091, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0021, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0103, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0123, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0115, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0095, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0015, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0103, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0087, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0185, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0145, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0086, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0119, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0106, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0138, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0140, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0004, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0161, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0096, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0188, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0126, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0108, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0158, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0101, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0100, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0164, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0199, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0096, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0239, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0096, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0090, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0162, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0111, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0087, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0107, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0100, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0094, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0092, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0122, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0012, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0136, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0009, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0101, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0107, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0101, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0089, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0111, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0142, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0019, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0024, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0085, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0122, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0132, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0110, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0125, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0084, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0118, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0144, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0083, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0012, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0168, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0129, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0138, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0117, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0164, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0114, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0138, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0162, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0161, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0096, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0124, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0122, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0019, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0096, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0080, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0167, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0117, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0090, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0082, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0126, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0084, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0256, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0120, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0213, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0218, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0080, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0097, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0130, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0156, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0082, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0093, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0022, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0130, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0110, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0120, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0083, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0097, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0093, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0087, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0013, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0089, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0002, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0098, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0190, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0175, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0100, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0170, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0153, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0089, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0150, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0100, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0086, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0124, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0141, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0102, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0003, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0133, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0002, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0093, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0170, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0118, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0104, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0105, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0161, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0142, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0092, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0103, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0182, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0008, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0136, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0107, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0130, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0012, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0085, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0132, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0113, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0157, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0120, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0106, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0158, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0130, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0009, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0147, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0088, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0120, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0087, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0115, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0134, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0117, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0085, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0120, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0142, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0103, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0205, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0019, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0111, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0088, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0210, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0093, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0093, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0097, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0146, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0108, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0161, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0117, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0101, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0141, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0086, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0117, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0115, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0239, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0171, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0081, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0017, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0129, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0135, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0131, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0125, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0082, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0098, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0130, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0155, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0126, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0108, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0115, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0092, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0141, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0104, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0178, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0154, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0135, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0134, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0112, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0083, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0089, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0116, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0159, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0214, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0099, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0084, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0010, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0086, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0159, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0110, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0166, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0199, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0107, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0261, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0099, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0088, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0006, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0190, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0106, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0087, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0086, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0118, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0099, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0124, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0024, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0107, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0123, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0099, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0151, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0118, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0218, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0021, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0018, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0129, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0096, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0084, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0112, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0169, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0157, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0258, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0145, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0193, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0274, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0154, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0093, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0106, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0163, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0158, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0110, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0164, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0110, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0091, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0099, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0099, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0128, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0111, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0101, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0084, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0011, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0168, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0019, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0129, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0090, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0189, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0196, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0106, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0125, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0116, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0122, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0088, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0024, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0104, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0098, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0100, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0176, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0123, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0166, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0123, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0086, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0139, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0104, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0143, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0194, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0101, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0111, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0152, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0157, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0109, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0145, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0018, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0083, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0201, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0140, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0113, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0095, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0137, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0120, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0020, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0094, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0117, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0117, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0097, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0096, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0097, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0108, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0135, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0081, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0103, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0017, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0086, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0084, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0092, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0005, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0099, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0096, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0081, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0090, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0088, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0100, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0106, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0093, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0109, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0088, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0091, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0120, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0120, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0138, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0116, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0088, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0013, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0091, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0154, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0107, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0118, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0156, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0082, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0112, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0098, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0140, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0128, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0089, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0135, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0161, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0147, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0121, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0123, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0017, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0108, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0110, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0139, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0167, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0082, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0130, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0115, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0178, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0094, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0200, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0099, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0117, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0136, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0121, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0148, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0095, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0106, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0104, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0114, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0097, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0106, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0130, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0220, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0101, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0111, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0097, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0210, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0085, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0019, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0020, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0138, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0023, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0099, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0119, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0124, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0128, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0105, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0165, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0182, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0113, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0083, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0116, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0012, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0123, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0086, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0021, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0094, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0217, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0107, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0114, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0175, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0081, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0120, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0088, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0131, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0098, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0160, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0015, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0164, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0004, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0101, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0096, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0193, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0276, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0089, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0102, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0135, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0106, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0127, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0007, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0174, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0080, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0014, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0162, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0090, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0103, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0206, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0106, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0136, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0115, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0094, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0219, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0212, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0135, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0122, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0083, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0088, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0233, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0164, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0195, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0111, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0155, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0109, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0129, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0081, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0250, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0088, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0091, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0095, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0178, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0084, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0087, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0114, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0096, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0140, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0102, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0129, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0118, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0114, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0203, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0100, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0136, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0114, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0092, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0147, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0172, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0128, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0144, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0104, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0148, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0082, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0089, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0109, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0015, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0187, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0137, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0095, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0100, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0102, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0110, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0080, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0098, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0095, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0094, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0085, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0137, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0112, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0011, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0115, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0152, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0094, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0093, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0149, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0100, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0006, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0127, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0094, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0112, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0105, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0134, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0096, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0102, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0093, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0109, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0133, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0098, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0132, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0117, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0108, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0124, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0092, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0019, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0109, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0159, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0004, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0212, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0009, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0171, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0109, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0195, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0006, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0110, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0140, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0130, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0013, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0144, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0167, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0096, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0196, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0124, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0082, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0145, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0103, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0125, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0020, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0093, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0107, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0271, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0209, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0124, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0105, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0087, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0101, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0114, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0012, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0181, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0163, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0139, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0131, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0151, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0106, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0159, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0093, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0082, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0151, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0103, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0086, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0114, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0097, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0142, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0103, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0125, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0094, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0097, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0167, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0156, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0104, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0091, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0143, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0132, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0143, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0112, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0146, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0146, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0090, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0145, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0149, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0022, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0154, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0106, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0091, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0087, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0204, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0101, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0011, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0084, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0103, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0097, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0119, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0207, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0103, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0125, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0083, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0155, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0091, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0093, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0080, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0009, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0157, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0205, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0209, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0014, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0136, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0130, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0122, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0143, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0104, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0085, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0101, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0171, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0109, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0166, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0138, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0024, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0095, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0098, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0178, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0151, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0155, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0089, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0104, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0088, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0086, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0092, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0107, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0087, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0129, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0182, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0117, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0127, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0162, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0195, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0168, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0090, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0140, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0085, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0157, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0128, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0124, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0129, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0107, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0318, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0094, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0018, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0112, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0101, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0085, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0098, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0101, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0153, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0082, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0153, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0021, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0147, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0091, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0009, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0083, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0082, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0093, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0131, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0137, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0192, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0105, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0105, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0219, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0163, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0093, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0147, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0095, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0169, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0084, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0085, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0117, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0099, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0084, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0129, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0003, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0102, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0082, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0113, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0102, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0092, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0089, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0099, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0117, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0091, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0022, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0098, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0100, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0096, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0128, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0106, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0011, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0162, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0004, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0110, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0160, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0147, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0124, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0101, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0199, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0116, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0117, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0145, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0091, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0105, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0095, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0086, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0007, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0010, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0129, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0257, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0145, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0113, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0167, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0100, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0148, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0152, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0022, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0107, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0117, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0086, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0091, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0081, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0086, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0135, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0118, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0147, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0129, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0118, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0134, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0231, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0007, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0091, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0156, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0116, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0124, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0018, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0153, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0091, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0172, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0109, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0104, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0084, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0014, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0113, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0110, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0133, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0103, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0101, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0095, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0219, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0088, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0158, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0107, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0021, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0080, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0130, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0135, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0087, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0102, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0085, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0104, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0093, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0186, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0090, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0017, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0088, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0124, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0157, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0146, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0104, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0085, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0149, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0154, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0115, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0089, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0122, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0092, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0141, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0145, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0108, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0012, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0099, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0012, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0084, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0128, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0126, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0100, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0091, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0189, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0128, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0209, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0082, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0240, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0100, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0088, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0146, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0007, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0104, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0196, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0080, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0269, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0102, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0090, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0185, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0081, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0099, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0107, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0135, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0160, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0143, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0129, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0107, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0107, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0153, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0113, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0157, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0105, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0126, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0022, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0126, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0009, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0174, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0186, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0112, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0192, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0090, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0093, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0100, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0006, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0082, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0104, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0108, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0254, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0096, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0137, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0105, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0121, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0100, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0081, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0017, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0090, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0004, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0118, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0192, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0178, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0103, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0120, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0080, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0117, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0221, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0187, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0119, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0103, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0195, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0131, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0117, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0083, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0093, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0116, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0132, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0109, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0086, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0109, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0159, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0082, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0085, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0094, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0100, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0163, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0089, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0110, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0092, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0232, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0094, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0245, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0083, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0216, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0080, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0092, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0098, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0018, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0131, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0184, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0103, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0136, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0137, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0087, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0098, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0114, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0127, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0119, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0094, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0114, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0097, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0150, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0083, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0011, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0014, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0133, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0109, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0098, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0096, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0089, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0104, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0196, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0176, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0169, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0191, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0100, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0113, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0021, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0164, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0016, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0148, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0091, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0127, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0094, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0195, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0090, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0102, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0147, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0094, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0087, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0142, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0132, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0176, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0130, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0024, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0112, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0134, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0086, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0099, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0112, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0094, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0155, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0112, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0146, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0116, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0083, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0168, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0100, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0107, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0107, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0109, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0114, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0125, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0084, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0132, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0114, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0117, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0104, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0165, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0083, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0102, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0171, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0017, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0099, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0171, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0003, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0117, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0099, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0084, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0119, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0121, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0098, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0107, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0089, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0100, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0098, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0098, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0023, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0149, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0144, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0235, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0151, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0166, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0106, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0106, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0086, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0155, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0100, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0184, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0159, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0080, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0102, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0111, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0145, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0146, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0257, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0201, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0021, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0180, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0100, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0107, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0113, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0125, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0082, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0110, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0111, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0096, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0097, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0146, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0104, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0101, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0137, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0211, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0007, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0110, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0113, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0109, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0129, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0112, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0238, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0113, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0117, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0123, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0165, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0088, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0107, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0127, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0113, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0181, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0100, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0091, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0202, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0083, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0116, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0182, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0163, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0164, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0161, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0084, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0200, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0170, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0141, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0292, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0136, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0180, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0157, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0119, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0116, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0131, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0289, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0103, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0118, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0158, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0093, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0090, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0110, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0137, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0083, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0140, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0159, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0165, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0087, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0210, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0003, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0154, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0089, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0128, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0094, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0090, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0173, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0099, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0234, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0086, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0137, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0108, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0124, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0114, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0173, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0086, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0129, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0087, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0158, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0141, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0105, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0189, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0092, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0091, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0020, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0140, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0081, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0203, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0129, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0124, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0211, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0216, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0142, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0088, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0098, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0198, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0116, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0165, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0090, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0089, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0114, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0128, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0181, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0130, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0085, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0104, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0107, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0134, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0101, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0116, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0091, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0114, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0112, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0156, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0131, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0137, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0016, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0082, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0098, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0098, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0102, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0081, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0015, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0088, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0126, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0023, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0132, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0146, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0103, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0182, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0126, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0108, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0110, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0106, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0091, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0123, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0113, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0137, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0096, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0105, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0138, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0080, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0118, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0150, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0082, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0150, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0146, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0145, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0097, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0128, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0111, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0084, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0081, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0111, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0155, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0124, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0087, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0117, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0092, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0088, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0011, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0111, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0020, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0179, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0118, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0128, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0135, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0128, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0090, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0024, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0184, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0019, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0016, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0106, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0099, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0010, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0016, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0090, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0084, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0181, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0122, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0013, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0127, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0106, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0186, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0015, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0015, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0084, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0111, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0102, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0142, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0227, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0083, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0148, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0200, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0094, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0169, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0019, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0531, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0088, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0095, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0150, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0104, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0110, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0165, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0137, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0092, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0180, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0006, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0086, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0110, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0147, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0116, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0086, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0102, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0214, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0081, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0087, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0144, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0022, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0087, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0082, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0164, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0142, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0102, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0110, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0123, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0122, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0122, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0159, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0111, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0176, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0114, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0110, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0203, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0092, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0134, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0114, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0017, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0095, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0095, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0099, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0142, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0105, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0115, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0094, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0102, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0106, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0167, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0190, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0088, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0129, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0164, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0112, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0098, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0092, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0090, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0014, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0081, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0110, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0143, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0016, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0120, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0143, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0119, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0128, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0138, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0144, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0009, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0087, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0143, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0095, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0185, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0153, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0165, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0082, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0127, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0123, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0149, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0100, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0004, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0092, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0017, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0167, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0175, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0085, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0089, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0255, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0168, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0087, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0091, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0007, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0237, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0107, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0244, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0136, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0106, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0102, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0174, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0126, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0152, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0247, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0114, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0093, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0091, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0116, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0137, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0105, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0082, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0100, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0092, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0121, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0019, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0086, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0108, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0101, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0140, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0112, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0117, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0093, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0094, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0084, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0152, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0088, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0088, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0086, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0017, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0200, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0090, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0144, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0144, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0126, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0020, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0088, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0009, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0081, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0120, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0121, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0094, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0123, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0100, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0116, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0163, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0090, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0101, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0024, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0142, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0114, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0155, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0084, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0126, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0003, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0091, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0086, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0109, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0121, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0098, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0120, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0102, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0110, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0108, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0107, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0165, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0124, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0092, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0088, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0177, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0094, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0119, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0100, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0086, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0099, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0184, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0119, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0084, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0106, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0022, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0172, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0161, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0148, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0128, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0086, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0149, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0081, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0117, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0145, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0137, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0109, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0127, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0164, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0093, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0115, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0118, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0118, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0083, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0100, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0125, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0091, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0169, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0106, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0103, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0009, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0223, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0155, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0132, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0123, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0024, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0013, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0107, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0125, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0237, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0080, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0120, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0096, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0082, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0091, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0173, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0154, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0014, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0112, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0156, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0121, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0142, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0121, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0090, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0094, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0178, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0092, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0135, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0130, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0098, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0115, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0152, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0126, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0085, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0117, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0089, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0092, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0131, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0148, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0117, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0147, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0151, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0090, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0123, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0265, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0213, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0114, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0084, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0233, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0088, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0100, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0003, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0132, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0086, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0140, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0003, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0108, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0137, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0019, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0127, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0128, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0086, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0127, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0123, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0182, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0099, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0202, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0139, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0102, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0149, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0155, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0014, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0103, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0144, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0151, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0093, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0100, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0014, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0193, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0174, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0154, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0085, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0188, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0088, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0085, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0017, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0195, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0114, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0125, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0126, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0096, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0119, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0131, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0187, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0102, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0094, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0086, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0113, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0112, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0105, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0094, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0003, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0106, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0008, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0142, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0132, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0163, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0082, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0154, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0020, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0155, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0080, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0210, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0118, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0020, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0088, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0185, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0122, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0165, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0016, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0089, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0084, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0146, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0100, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0192, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0197, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0101, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0089, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0088, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0087, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0103, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0159, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0102, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0127, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0176, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0090, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0089, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0137, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0087, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0130, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0150, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0100, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0095, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0096, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0098, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0018, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0114, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0145, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0094, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0091, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0109, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0083, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0106, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0235, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0099, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0100, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0148, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0127, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0108, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0021, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0123, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0120, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0189, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0108, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0087, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0087, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0116, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0126, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0117, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0088, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0128, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0154, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0008, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0018, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0094, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0158, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0102, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0080, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0081, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0146, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0151, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0167, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0214, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0120, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0101, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0099, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0224, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0099, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0149, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0101, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0192, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0105, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0099, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0115, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0102, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0235, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0108, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0105, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0133, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0014, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0023, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0153, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0021, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0022, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0091, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0235, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0115, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0008, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0082, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0021, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0158, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0132, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0102, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0023, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0145, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0126, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0197, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0139, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0005, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0098, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0220, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0090, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0120, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0160, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0182, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0148, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0182, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0153, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0235, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0088, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0019, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0020, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0007, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0169, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0148, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0152, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0086, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0093, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0097, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0132, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0103, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0083, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0081, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0080, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0014, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0103, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0141, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0163, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0167, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0109, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0087, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0152, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0132, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0091, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0018, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0138, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0142, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0085, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0146, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0119, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0099, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0127, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0099, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0206, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0138, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0107, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0137, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0133, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0114, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0098, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0085, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0146, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0087, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0124, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0091, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0089, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0115, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0093, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0002, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0092, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0128, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0105, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0145, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0085, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0128, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0234, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0114, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0163, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0008, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0094, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0198, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0092, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0088, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0017, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0089, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0103, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0096, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0003, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0156, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0108, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0120, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0102, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0153, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0129, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0023, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0081, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0089, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0119, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0229, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0093, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0142, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0176, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0020, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0129, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0088, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0109, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0138, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0094, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0249, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0118, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0097, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0152, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0097, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0140, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0123, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0137, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0191, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0113, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0102, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0112, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0157, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0098, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0100, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0140, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0023, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0147, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0120, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0139, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0092, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0089, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0137, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0089, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0142, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0215, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0135, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0131, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0081, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0135, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0117, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0086, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0096, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0136, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0089, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0115, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0196, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0161, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0110, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0088, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0169, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0102, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0180, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0111, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0119, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0154, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0084, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0003, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0157, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0094, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0090, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0118, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0002, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0176, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0123, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0102, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0120, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0202, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0126, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0110, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0143, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0141, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0016, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0099, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0125, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0106, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0081, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0248, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0138, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0117, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0116, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0023, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0129, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0002, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0007, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0118, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0100, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0184, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0148, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0171, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0188, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0087, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0096, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0116, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0126, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0117, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0021, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0017, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0096, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0166, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0131, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0016, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0011, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0206, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0256, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0080, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0090, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0009, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0098, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0118, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0111, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0080, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0012, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0080, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0089, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0141, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0158, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0015, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0103, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0107, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0002, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0100, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0120, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0130, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0123, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0019, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0002, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0094, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0094, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0132, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0082, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0109, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0092, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0089, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0100, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0002, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0108, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0133, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0088, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0104, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0128, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0109, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0108, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0096, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0082, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0018, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0115, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0209, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0102, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0096, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0110, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0131, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0122, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0106, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0097, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0121, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0205, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0095, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0110, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0131, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0203, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0111, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0117, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0081, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0140, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0092, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0096, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0085, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0129, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0119, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0093, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0009, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0088, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0104, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0092, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0084, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0185, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0089, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0105, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0126, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0142, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0123, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0102, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0098, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0147, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0099, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0096, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0168, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0178, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0209, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0127, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0165, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0105, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0023, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0100, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0085, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0121, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0112, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0161, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0085, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0185, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0133, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0110, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0114, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0122, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0120, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0111, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0114, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0126, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0135, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0008, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0155, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0093, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0197, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0150, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0103, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0005, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0022, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0121, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0104, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0080, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0103, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0160, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0115, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0098, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0126, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0154, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0081, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0082, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0141, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0102, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0080, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0166, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0184, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0128, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0081, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0085, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0090, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0177, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0122, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0120, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0125, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0136, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0137, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0108, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0105, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0157, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0116, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0139, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0101, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0100, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0009, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0128, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0097, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0139, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0124, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0121, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0116, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0180, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0173, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0127, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0092, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0080, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0081, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0021, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0093, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0088, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0148, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0207, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0127, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0200, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0018, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0103, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0161, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0108, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0120, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0155, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0095, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0141, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0143, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0100, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0104, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0121, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0108, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0155, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0109, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0024, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0119, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0188, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0117, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0003, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0148, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0110, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0133, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0098, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0120, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0122, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0183, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0144, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0118, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0099, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0151, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0092, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0211, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0106, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0151, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0136, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0131, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0124, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0090, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0114, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0140, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0151, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0172, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0091, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0149, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0082, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0123, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0080, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0083, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0104, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0111, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0146, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0107, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0088, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0123, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0090, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0090, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0112, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0121, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0107, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0158, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0102, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0022, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0121, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0128, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0106, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0103, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0177, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0113, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0145, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0014, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0193, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0120, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0099, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0082, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0086, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0142, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0094, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0081, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0124, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0136, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0163, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0108, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0093, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0099, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0116, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0134, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0182, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0218, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0107, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0084, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0140, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0095, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0102, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0105, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0166, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0160, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0104, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0156, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0120, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0098, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0010, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0093, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0135, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0106, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0111, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0110, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0125, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0112, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0090, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0080, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0117, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0145, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0152, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0115, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0128, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0011, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0121, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0092, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0149, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0256, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0090, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0132, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0151, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0086, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0096, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0123, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0127, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0122, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0139, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0210, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0095, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0102, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0102, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0126, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0191, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0145, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0151, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0086, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0137, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0198, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0162, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0119, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0128, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0171, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0088, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0092, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0157, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0016, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0121, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0102, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0130, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0134, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0083, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0169, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0082, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0167, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0105, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0189, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0021, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0088, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0105, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0139, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0087, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0088, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0113, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0153, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0105, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0126, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0086, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0163, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0085, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0109, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0084, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0111, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0111, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0143, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0182, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0088, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0138, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0137, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0080, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0115, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0094, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0093, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0118, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0085, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0123, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0116, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0091, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0089, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0119, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0086, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0023, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0112, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0101, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0099, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0113, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0088, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0184, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0093, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0080, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0103, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0148, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0089, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0103, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0117, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0081, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0090, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0085, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0081, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0116, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0107, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0091, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0085, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0017, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0117, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0107, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0116, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0110, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0081, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0086, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0159, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0107, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0092, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0092, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0127, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0106, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0103, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0124, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0086, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0021, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0117, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0118, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0151, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0111, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0105, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0120, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0143, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0095, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0129, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0103, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0100, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0143, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0135, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0098, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0091, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0080, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0093, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0102, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0095, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0100, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0137, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0104, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0160, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0080, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0118, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0091, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0153, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0107, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0140, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0127, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0094, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0090, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0158, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0091, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0095, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0093, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0126, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0011, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0092, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0101, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0173, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0080, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0104, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0105, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0109, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0083, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0158, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0115, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0193, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0089, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0085, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0158, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0111, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0132, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0101, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0085, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0093, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0128, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0086, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0103, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0123, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0101, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0096, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0106, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0154, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0130, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0132, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0118, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0094, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0084, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0108, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0138, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0082, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0115, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0122, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0121, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0167, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0110, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0085, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0113, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0089, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0162, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0122, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0251, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0081, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0084, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0131, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0103, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0102, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0094, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0091, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0087, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0135, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0103, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0091, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0113, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0095, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0119, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0023, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0188, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0175, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0093, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0114, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0145, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0100, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0008, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0081, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0085, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0084, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0088, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0130, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0164, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0132, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0161, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0100, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0155, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0151, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0083, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0157, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0086, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0087, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0095, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0087, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0018, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0123, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0080, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0083, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0088, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0109, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0116, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0094, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0115, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0157, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0116, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0080, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0133, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0132, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0096, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0177, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0182, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0088, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0089, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0142, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0191, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0085, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0144, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0103, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0081, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0113, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0089, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0092, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0024, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0116, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0094, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0097, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0254, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0106, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0132, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0167, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0084, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0107, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0183, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0095, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0112, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0195, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0127, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0109, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0137, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0126, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0090, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0098, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0082, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0125, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0103, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0106, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0143, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0084, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0109, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0165, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0147, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0081, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0083, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0093, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0157, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0105, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0116, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0091, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0105, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0097, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0125, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0128, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0101, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0110, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0142, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0109, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0112, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0107, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0102, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0089, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0253, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0104, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0167, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0104, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0156, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0108, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0149, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0109, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0124, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0106, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0104, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0137, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0097, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0119, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0081, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0175, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0008, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0087, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0148, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0142, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0149, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0162, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0114, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0084, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0022, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0171, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0106, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0135, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0091, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0096, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0093, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0104, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0082, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0122, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0163, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0163, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0020, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0139, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0105, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0102, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0102, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0086, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0101, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0116, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0123, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0169, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0181, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0104, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0116, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0115, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0168, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0133, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0150, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0126, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0096, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0087, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0087, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0095, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0150, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0153, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0154, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0138, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0083, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0111, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0133, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0098, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0136, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0096, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0094, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0179, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0082, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0114, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0112, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0126, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0151, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0082, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0122, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0080, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0129, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0123, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0095, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0126, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0119, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0101, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0087, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0115, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0093, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0090, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0114, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0081, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0180, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0099, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0100, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0112, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0097, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0089, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0129, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0107, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0023, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0103, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0122, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0137, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0089, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0147, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0107, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0161, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0111, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0120, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0141, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0116, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0150, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0116, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0106, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0148, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0112, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0125, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0161, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0091, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0089, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0143, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0189, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0162, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0177, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0118, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0126, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0124, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0091, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0016, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0080, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0097, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0130, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0112, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0217, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0089, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0096, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0086, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0105, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0179, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0096, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0085, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0092, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0021, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0087, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0093, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0135, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0096, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0100, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0144, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0138, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0096, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0149, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0183, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0024, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0117, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0144, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0088, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0110, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0082, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0091, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0089, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0137, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0105, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0083, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0088, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0023, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0136, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0092, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0116, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0132, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0087, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0090, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0080, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0127, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0098, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0085, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0081, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0130, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0094, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0146, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0156, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0151, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0087, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0102, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0129, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0138, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0109, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0091, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0082, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0132, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0137, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0101, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0086, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0081, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0095, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0096, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0197, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0123, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0098, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0013, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0106, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0095, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0175, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0082, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0088, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0143, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0086, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0084, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0189, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0199, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0104, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0099, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0107, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0108, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0130, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0081, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0099, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0199, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0012, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0153, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0110, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0166, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0100, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0084, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0105, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0214, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0095, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0102, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0142, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0087, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0099, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0080, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0111, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0024, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0182, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0094, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0082, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0118, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0097, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0090, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0149, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0095, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0140, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0086, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0139, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0148, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0112, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0098, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0008, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0133, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0083, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0100, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0121, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0110, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0088, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0112, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0120, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0085, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0145, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0169, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0106, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0094, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0146, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0093, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0113, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0135, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0083, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0084, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0152, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0138, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0138, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0149, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0097, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0083, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0151, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0121, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0118, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0106, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0118, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0103, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0124, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0093, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0140, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0174, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0105, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0097, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0094, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0140, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0023, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0120, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0098, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0150, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0083, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0187, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0215, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0084, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0120, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0114, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0129, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0119, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0135, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0093, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0120, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0157, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0112, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0129, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0167, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0087, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0085, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0089, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0100, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0109, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0089, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0166, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0131, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0092, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0113, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0087, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0162, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0125, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0156, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0179, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0086, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0092, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0124, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0155, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0081, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0109, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0089, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0119, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0084, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0007, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0081, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0089, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0130, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0099, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0021, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0102, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0111, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0087, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0103, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0132, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0117, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0119, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0083, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0124, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0127, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0131, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0111, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0099, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0081, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0081, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0096, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0187, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0086, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0094, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0117, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0087, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0019, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0132, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0101, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0104, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0104, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0109, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0081, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0094, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0086, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0142, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0134, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0098, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0116, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0102, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0112, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0103, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0113, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0081, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0109, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0147, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0080, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0108, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0091, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0111, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0168, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0119, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0089, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0094, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0114, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0164, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0093, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0092, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0201, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0106, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0091, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0204, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0094, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0190, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0165, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0019, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0088, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0106, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0117, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0086, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0085, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0100, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0109, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0106, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0097, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0124, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0080, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0083, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0112, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0153, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0080, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0128, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0089, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0100, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0084, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0106, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0130, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0091, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0115, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0128, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0135, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0103, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0082, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0093, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0144, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0008, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0083, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0097, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0092, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0100, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0178, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0120, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0084, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0023, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0136, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0157, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0106, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0084, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0121, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0110, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0108, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0098, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0169, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0106, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0091, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0186, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0108, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0106, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0113, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0097, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0310, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0108, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0085, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0083, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0117, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0140, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0093, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0105, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0087, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0105, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0084, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0092, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0109, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0090, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0080, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0130, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0105, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0125, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0144, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0121, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0091, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0114, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0150, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0152, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0156, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0160, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0143, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0097, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0108, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0088, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0113, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0154, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0123, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0088, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0082, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0102, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0129, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0105, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0099, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0108, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0103, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0082, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0177, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0080, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0100, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0103, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0135, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0124, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0022, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0088, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0088, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0142, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0084, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0106, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0088, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0098, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0091, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0085, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0171, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0084, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0023, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0024, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0093, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0093, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0024, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0144, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0188, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0144, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0102, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0118, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0132, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0088, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0104, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0112, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0115, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0220, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0087, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0120, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0137, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0086, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0091, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0093, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0099, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0127, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0092, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0085, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0080, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0085, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0149, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0089, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0099, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0145, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0200, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0120, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0092, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0110, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0107, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0083, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0121, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0127, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0148, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0103, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0109, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0110, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0080, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0015, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0172, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0083, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0089, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0087, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0091, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0011, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0093, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0138, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0090, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0116, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0022, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0082, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0155, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0087, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0124, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0158, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0102, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0082, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0170, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0130, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0119, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0150, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0117, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0093, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0183, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0150, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0120, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0083, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0083, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0091, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0115, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0095, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0117, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0107, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0106, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0131, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0126, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0086, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0112, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0098, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0167, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0100, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0142, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0113, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0134, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0081, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0131, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0110, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0090, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0120, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0112, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0114, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0012, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0179, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0091, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0080, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0088, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0099, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0099, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0107, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0192, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0145, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0086, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0130, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0124, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0120, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0136, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0085, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0088, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0095, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0109, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0108, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0116, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0101, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0104, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0114, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0132, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0121, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0189, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0136, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0111, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0085, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0124, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0086, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0103, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0131, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0094, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0138, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0096, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0096, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0107, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0194, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0134, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0152, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0107, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0083, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0084, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0013, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0157, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0125, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0118, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0093, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0119, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0128, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0095, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0112, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0087, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0144, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0083, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0102, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0123, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0175, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0105, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0088, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0100, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0191, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0106, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0109, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0092, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0145, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0125, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0169, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0081, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0019, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0126, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0092, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0088, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0120, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0086, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0085, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0097, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0081, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0090, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0104, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0097, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0127, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0139, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0103, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0095, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0170, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0083, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0151, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0136, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0097, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0118, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0096, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0125, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0120, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0088, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0114, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0014, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0129, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0123, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0085, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0084, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0161, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0092, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0110, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0099, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0088, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0111, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0093, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0094, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0190, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0172, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0013, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0099, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0116, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0126, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0084, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0107, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0080, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0130, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0091, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0083, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0209, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0080, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0100, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0092, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0136, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0169, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0084, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0164, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0119, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0082, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0117, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0081, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0105, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0087, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0097, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0080, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0099, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0093, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0163, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0123, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0106, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0082, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0140, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0113, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0139, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0128, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0131, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0115, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0095, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0085, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0122, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0089, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0096, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0096, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0127, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0129, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0024, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0088, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0116, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0101, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0104, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0020, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0105, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0143, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0166, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0017, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0173, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0084, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0108, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0081, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0099, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0103, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0016, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0084, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0215, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0167, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0114, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0227, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0127, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0083, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0090, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0089, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0081, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0009, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0139, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0104, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0099, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0088, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0092, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0160, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0124, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0144, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0093, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0139, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0119, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0118, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0113, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0128, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0113, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0082, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0120, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0120, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0119, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0128, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0015, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0088, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0168, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0084, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0100, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0112, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0084, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0146, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0089, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0091, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0136, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0145, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0230, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0117, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0106, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0157, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0083, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0116, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0086, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0017, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0081, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0086, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0163, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0099, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0107, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0121, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0114, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0110, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0091, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0086, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0090, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0146, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0097, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0123, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0020, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0083, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0081, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0149, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0081, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0161, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0155, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0113, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0126, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0080, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0109, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0081, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0137, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0148, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0107, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0085, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0131, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0140, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0080, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0125, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0104, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0083, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0169, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0103, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0081, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0080, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0086, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0132, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0086, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0098, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0097, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0092, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0090, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0096, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0109, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0129, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0093, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0082, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0111, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0094, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0108, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0122, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0117, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0117, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0104, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0126, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0139, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0104, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0098, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0187, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0124, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0126, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0085, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0121, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0080, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0172, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0104, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0106, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0145, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0169, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0118, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0119, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0117, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0108, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0095, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0121, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0095, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0022, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0129, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0155, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0090, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0011, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0121, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0098, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0089, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0147, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0112, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0107, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0084, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0144, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0013, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0080, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0080, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0122, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0095, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0085, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0106, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0158, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0084, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0135, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0087, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0153, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0120, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0086, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0144, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0112, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0096, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0105, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0104, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0142, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0102, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0099, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0101, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0169, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0122, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0099, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0099, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0122, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0121, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0188, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0175, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0094, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0116, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0177, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0126, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0112, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0146, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0088, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0081, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0157, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0086, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0158, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0085, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0098, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0094, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0116, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0153, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0142, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0136, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0143, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0103, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0083, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0102, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0087, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0112, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0152, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0086, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0106, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0082, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0178, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0139, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0143, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0189, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0116, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0110, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0153, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0126, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0126, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0096, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0081, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0090, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0098, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0110, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0096, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0091, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0101, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0106, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0118, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0124, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0086, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0139, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0158, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0146, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0116, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0132, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0105, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0127, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0127, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0124, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0093, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0140, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0126, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0110, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0092, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0135, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0086, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0117, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0098, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0139, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0124, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0022, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0172, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0108, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0086, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0114, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0092, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0147, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0110, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0092, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0151, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0096, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0082, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0133, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0146, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0109, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0112, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0129, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0117, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0115, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0118, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0100, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0085, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0180, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0103, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0195, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0101, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0101, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0100, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0099, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0096, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0094, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0139, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0122, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0108, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0102, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0090, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0122, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0086, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0157, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0102, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0105, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0092, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0088, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0082, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0119, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0103, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0092, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0123, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0091, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0084, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0114, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0163, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0083, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0099, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0122, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0150, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0127, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0112, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0246, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0145, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0099, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0099, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0097, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0140, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0118, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0080, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0086, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0130, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0102, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0093, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0109, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0082, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0086, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0134, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0099, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0113, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0132, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0099, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0080, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0138, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0130, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0090, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0113, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0120, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0087, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0104, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0142, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0092, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0109, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0102, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0180, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0108, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0097, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0084, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0121, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0122, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0134, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0094, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0138, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0139, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0150, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0130, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0097, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0129, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0132, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0124, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0082, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0151, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0128, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0102, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0121, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0089, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0092, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0111, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0133, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0151, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0111, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0149, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0114, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0023, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0092, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0130, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0124, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0090, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0137, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0132, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0170, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0108, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0084, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0120, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0123, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0294, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0149, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0093, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0086, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0125, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0080, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0105, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0086, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0120, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0095, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0125, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0149, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0085, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0110, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0132, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0183, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0097, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0086, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0190, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0146, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0095, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0240, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0092, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0119, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0158, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0089, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0118, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0104, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0107, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0092, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0080, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0101, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0091, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0087, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0134, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0123, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0100, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0106, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0128, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0103, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0095, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0144, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0158, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0150, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0088, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0129, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0096, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0117, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0101, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0127, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0130, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0097, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0145, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0089, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0110, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0119, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0084, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0153, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0087, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0115, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0019, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0146, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0096, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0111, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0090, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0142, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0090, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0086, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0019, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0106, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0101, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0146, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0136, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0116, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0153, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0088, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0100, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0176, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0080, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0080, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0083, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0098, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0134, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0106, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0092, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0145, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0129, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0113, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0086, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0115, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0157, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0128, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0123, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0104, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0134, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0090, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0085, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0092, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0125, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0162, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0113, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0097, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0089, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0126, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0091, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0124, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0084, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0147, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0024, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0104, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0108, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0088, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0143, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0086, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0084, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0098, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0107, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0143, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0152, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0089, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0109, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0109, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0088, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0103, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0095, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0108, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0082, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0010, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0095, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0023, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0100, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0166, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0113, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0170, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0082, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0111, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0091, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0221, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0135, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0081, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0103, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0083, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0086, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0082, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0089, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0098, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0094, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0094, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0153, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0125, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0105, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0201, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0147, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0083, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0122, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0129, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0106, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0103, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0146, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0080, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0162, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0136, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0084, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0145, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0108, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0119, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0103, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0088, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0101, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0160, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0090, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0144, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0082, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0105, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0118, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0097, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0137, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0083, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0112, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0089, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0248, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0109, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0090, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0138, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0167, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0082, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0114, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0097, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0093, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0118, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0165, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0108, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0122, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0107, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0090, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0131, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0024, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0097, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0102, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0102, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0123, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0131, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0193, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0108, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0104, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0085, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0088, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0175, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0102, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0114, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0080, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0124, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0105, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0086, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0099, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0082, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0124, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0143, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0140, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0091, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0100, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0081, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0132, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0110, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0088, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0017, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0088, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0015, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0080, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0081, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0112, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0114, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0193, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0108, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0090, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0104, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0023, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0153, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0089, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0105, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0088, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0137, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0086, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0110, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0145, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0095, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0023, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0126, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0193, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0100, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0102, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0095, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0149, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0159, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0129, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0080, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0115, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0092, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0113, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0082, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0087, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0090, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0153, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0083, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0081, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0082, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0114, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0142, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0100, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0099, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0192, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0111, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0086, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0096, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0080, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0092, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0089, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0109, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0170, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0117, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0166, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0088, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0191, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0023, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0102, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0120, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0093, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0087, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0082, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0114, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0103, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0152, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0143, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0100, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0108, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0085, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0085, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0111, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0094, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0100, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0094, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0086, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0125, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0140, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0184, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0121, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0092, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0115, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0100, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0114, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0143, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0087, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0085, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0091, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0083, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0084, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0095, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0139, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0105, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0117, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0098, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0114, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0098, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0088, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0118, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0119, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0083, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0100, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0081, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0081, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0134, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0105, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0089, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0113, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0084, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0091, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0088, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0022, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0095, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0127, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0105, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0148, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0150, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0160, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0145, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0124, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0082, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0127, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0132, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0121, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0081, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0080, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0147, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0107, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0086, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0017, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0132, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0090, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0090, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0094, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0110, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0082, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0089, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0149, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0013, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0107, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0093, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0119, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0121, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0095, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0085, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0093, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0124, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0088, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0085, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0089, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0102, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0105, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0127, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0101, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0083, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0107, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0089, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0094, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0138, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0089, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0147, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0103, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0144, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0150, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0193, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0093, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0090, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0102, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0178, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0094, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0087, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0081, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0134, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0128, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0117, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0088, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0117, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0105, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0130, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0082, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0107, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0112, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0147, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0116, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0103, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0089, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0117, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0110, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0084, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0134, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0124, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0021, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0099, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0119, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0092, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0109, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0112, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0088, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0124, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0086, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0123, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0099, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0139, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0091, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0104, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0084, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0105, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0100, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0092, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0112, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0089, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0175, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0138, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0188, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0092, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0138, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0101, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0086, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0085, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0081, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0099, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0134, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0089, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0091, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0095, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0081, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0140, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0099, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0144, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0090, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0099, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0089, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0112, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0085, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0080, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0117, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0085, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0094, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0095, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0099, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0104, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0083, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0122, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0133, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0167, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0103, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0082, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0089, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0094, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0202, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0016, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0086, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0132, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0089, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0085, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0156, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0085, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0083, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0020, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0111, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0086, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0143, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0127, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0139, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0130, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0111, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0129, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0129, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0088, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0128, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0113, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0093, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0111, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0081, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0117, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0094, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0080, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0207, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0182, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0098, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0080, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0100, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0139, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0136, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0095, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0101, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0118, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0095, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0104, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0096, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0092, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0139, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0138, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0197, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0106, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0095, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0082, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0098, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0164, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0106, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0103, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0089, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0087, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0130, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0146, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0080, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0144, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0083, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0201, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0114, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0092, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0082, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0095, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0103, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0088, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0106, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0117, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0083, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0085, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0170, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0098, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0168, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0101, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0124, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0117, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0093, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0080, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0092, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0116, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0104, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0110, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0088, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0111, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0117, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0098, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0088, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0080, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0164, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0133, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0098, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0087, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0091, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0140, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0188, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0096, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0089, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0131, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0110, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0083, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0109, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0081, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0086, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0112, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0101, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0119, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0117, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0097, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0083, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0011, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0094, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0092, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0022, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0087, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0114, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0080, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0082, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0125, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0101, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0085, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0096, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0129, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0084, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0093, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0080, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0107, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0082, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0118, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0107, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0096, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0167, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0083, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0135, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0021, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0092, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0086, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0104, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0118, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0083, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0104, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0143, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0166, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0082, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0126, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0121, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0109, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0178, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0106, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0099, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0116, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0126, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0102, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0018, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0105, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0100, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0106, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0115, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0095, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0090, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0109, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0084, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0082, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0162, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0082, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0137, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0080, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0119, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0147, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0103, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0092, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0124, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0146, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0112, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0106, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0095, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0091, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0118, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0084, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0097, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0018, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0126, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0084, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0096, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0108, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0111, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0105, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0085, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0171, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0016, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0092, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0099, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0143, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0087, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0094, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0084, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0023, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0121, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0080, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0088, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0096, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0138, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0086, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0117, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0102, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0115, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0015, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0108, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0168, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0020, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0136, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0137, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0089, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0110, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0114, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0093, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0082, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0204, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0090, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0105, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0020, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0091, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0100, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0097, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0161, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0084, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0106, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0137, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0084, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0083, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0086, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0094, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0020, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0126, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0105, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0083, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0149, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0126, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0084, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0111, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0154, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0085, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0114, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0114, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0094, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0129, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0095, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0086, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0115, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0024, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0089, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0082, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0106, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0142, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0111, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0127, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0098, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0083, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0019, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0023, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0106, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0124, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0084, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0121, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0018, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0105, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0099, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0099, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0097, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0086, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0084, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0089, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0091, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0103, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0107, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0083, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0097, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0117, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0117, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0141, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0089, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0099, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0109, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0116, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0099, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0125, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0096, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0106, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0110, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0103, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0020, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0100, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0147, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0141, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0248, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0009, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0081, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0081, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0117, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0097, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0085, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0092, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0095, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0118, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0140, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0106, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0093, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0163, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0212, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0170, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0015, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0144, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0083, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0082, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0144, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0096, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0090, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0103, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0083, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0140, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0104, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0088, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0212, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0123, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0099, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0090, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0100, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0086, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0121, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0138, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0086, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0137, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0081, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0098, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0161, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0118, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0090, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0017, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0084, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0101, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0099, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0112, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0094, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0103, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0149, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0102, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0091, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0119, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0115, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0147, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0127, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0137, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0094, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0209, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0087, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0105, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0024, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0109, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0095, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0018, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0024, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0095, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0107, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0124, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0088, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0099, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0099, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0086, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0081, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0132, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0086, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0089, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0106, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0116, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0119, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0009, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0098, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0084, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0080, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0086, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0122, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0108, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0112, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0015, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0019, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0157, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0122, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0169, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0117, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0096, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0101, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0136, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0107, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0110, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0113, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0142, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0149, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0012, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0141, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0106, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0023, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0018, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0166, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0100, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0092, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0136, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0102, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0102, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0083, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0171, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0112, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0100, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0089, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0093, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0021, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0176, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0176, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0094, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0133, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0085, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0107, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0124, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0106, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0082, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0117, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0093, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0152, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0083, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0093, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0009, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0115, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0098, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0083, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0138, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0015, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0137, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0143, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0128, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0126, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0020, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0092, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0087, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0092, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0201, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0103, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0115, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0109, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0277, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0096, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0101, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0115, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0095, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0150, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0093, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0115, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0120, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0017, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0023, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0094, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0108, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0095, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0099, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0122, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0119, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0111, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0158, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0021, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0098, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0098, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0019, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0081, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0088, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0110, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0127, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0093, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0098, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0087, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0144, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0100, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0085, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0103, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0148, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0082, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0083, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0024, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0093, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0125, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0127, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0096, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0145, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0091, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0093, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0122, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0160, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0112, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0086, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0082, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0110, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0270, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0089, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0167, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0156, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0091, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0260, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0133, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0143, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0100, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0081, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0112, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0090, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0091, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0083, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0160, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0129, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0081, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0118, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0085, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0102, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0089, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0101, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0132, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0085, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0142, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0093, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0087, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0099, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0022, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0107, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0115, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0143, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0105, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0014, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0085, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0146, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0104, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0097, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0118, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0099, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0099, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0103, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0148, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0092, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0132, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0089, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0128, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0110, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0238, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0152, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0081, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0117, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0157, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0083, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0096, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0092, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0148, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0180, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0111, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0083, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0082, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0123, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0092, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0117, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0085, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0101, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0084, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0085, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0080, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0083, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0144, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0087, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0168, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0091, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0023, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0092, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0084, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0090, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0101, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0092, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0155, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0011, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0134, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0130, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0117, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0082, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0147, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0116, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0142, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0154, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0090, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0088, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0085, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0088, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0125, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0145, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0120, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0086, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0136, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0087, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0173, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0146, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0083, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0109, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0090, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0105, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0155, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0082, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0099, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0195, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0099, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0082, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0116, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0093, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0128, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0120, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0160, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0126, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0173, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0088, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0128, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0107, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0113, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0153, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0015, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0093, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0083, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0142, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0223, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0138, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0139, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0198, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0211, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0082, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0021, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0093, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0018, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0214, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0107, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0133, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0004, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0102, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0091, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0010, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0089, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0106, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0113, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0152, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0098, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0124, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0020, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0127, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0091, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0141, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0159, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0088, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0142, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0136, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0010, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0020, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0080, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0086, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0143, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0018, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0086, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0110, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0132, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0096, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0147, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0085, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0098, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0111, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0102, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0164, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0088, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0127, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0101, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0017, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0091, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0125, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0083, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0146, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0107, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0131, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0089, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0100, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0110, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0013, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0094, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0083, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0092, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0107, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0080, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0086, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0099, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0113, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0090, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0098, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0095, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0092, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0143, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0103, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0091, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0010, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0138, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0098, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0019, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0088, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0113, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0080, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0148, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0117, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0129, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0102, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0114, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0106, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0104, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0113, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0106, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0086, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0145, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0019, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0125, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0235, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0089, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0125, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0126, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0094, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0094, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0087, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0118, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0139, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0087, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0098, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0228, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0083, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0111, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0019, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0111, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0103, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0085, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0104, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0086, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0094, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0086, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0017, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0186, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0090, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0080, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0096, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0170, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0166, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0093, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0021, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0143, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0108, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0111, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0111, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0080, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0169, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0126, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0095, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0102, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0080, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0154, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0177, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0143, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0085, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0107, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0172, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0131, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0116, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0142, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0111, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0085, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0104, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0083, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0210, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0013, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0088, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0116, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0095, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0141, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0118, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0146, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0121, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0080, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0088, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0142, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0021, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0083, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0200, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0218, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0083, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0094, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0084, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0103, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0092, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0102, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0080, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0085, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0080, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0146, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0134, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0100, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0083, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0123, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0120, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0113, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0096, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0126, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0103, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0114, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0084, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0133, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0131, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0121, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0083, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0019, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0089, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0087, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0142, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0113, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0090, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0140, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0153, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0135, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0024, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0109, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0022, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0094, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0114, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0083, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0097, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0134, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0088, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0126, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0084, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0093, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0084, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0124, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0139, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0091, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0136, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0094, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0088, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0083, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0090, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0206, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0081, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0099, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0093, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0096, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0122, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0117, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0106, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0023, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0110, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0139, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0097, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0130, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0096, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0085, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0080, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0095, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0101, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0088, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0102, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0133, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0124, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0084, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0008, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0128, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0102, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0084, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0091, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0097, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0123, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0020, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0018, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0123, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0139, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0023, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0088, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0020, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0080, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0087, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0080, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0130, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0143, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0022, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0106, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0127, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0099, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0099, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0127, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0199, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0120, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0167, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0102, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0111, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0109, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0020, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0205, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0088, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0013, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0104, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0096, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0182, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0019, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0023, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0120, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0107, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0136, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0162, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0085, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0107, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0155, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0089, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0085, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0080, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0104, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0081, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0141, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0211, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0103, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0109, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0109, device='cuda:0', grad_fn=<NllLossBackward>)Downloading winning-args-corpus to /root/.convokit/downloads/winning-args-corpus
Downloading winning-args-corpus from http://zissou.infosci.cornell.edu/convokit/datasets/winning-args-corpus/winning-args-corpus.zip (73.7MB)... Done
[91mWARNING: [0mUtterance text must be a string: text of utterance with ID: t1_cnhre1n has been casted to a string.
[91mWARNING: [0mUtterance text must be a string: text of utterance with ID: t1_cnhs1jf has been casted to a string.
[91mWARNING: [0mUtterance text must be a string: text of utterance with ID: t1_cn7mmnt has been casted to a string.
[91mWARNING: [0mUtterance text must be a string: text of utterance with ID: t1_cn66mck has been casted to a string.
[91mWARNING: [0mUtterance text must be a string: text of utterance with ID: t1_cmt6w97 has been casted to a string.
[91mWARNING: [0mUtterance text must be a string: text of utterance with ID: t1_cmsgxzm has been casted to a string.
[91mWARNING: [0mUtterance text must be a string: text of utterance with ID: t1_cmsitjr has been casted to a string.
[91mWARNING: [0mUtterance text must be a string: text of utterance with ID: t1_cmqqd49 has been casted to a string.
[91mWARNING: [0mUtterance text must be a string: text of utterance with ID: t1_cmqsp80 has been casted to a string.
[91mWARNING: [0mUtterance text must be a string: text of utterance with ID: t1_cmr57l3 has been casted to a string.
[91mWARNING: [0mUtterance text must be a string: text of utterance with ID: t1_cmihu76 has been casted to a string.
[91mWARNING: [0mUtterance text must be a string: text of utterance with ID: t1_cm85qp1 has been casted to a string.
[91mWARNING: [0mUtterance text must be a string: text of utterance with ID: t1_cm6tktt has been casted to a string.
[91mWARNING: [0mUtterance text must be a string: text of utterance with ID: t1_clz6jxt has been casted to a string.
[91mWARNING: [0mUtterance text must be a string: text of utterance with ID: t1_cm09icp has been casted to a string.
[91mWARNING: [0mUtterance text must be a string: text of utterance with ID: t1_cly0oho has been casted to a string.
[91mWARNING: [0mUtterance text must be a string: text of utterance with ID: t1_cly8wzq has been casted to a string.
[91mWARNING: [0mUtterance text must be a string: text of utterance with ID: t1_clxpe30 has been casted to a string.
[91mWARNING: [0mUtterance text must be a string: text of utterance with ID: t1_clv6oas has been casted to a string.
[91mWARNING: [0mUtterance text must be a string: text of utterance with ID: t1_cls8zr2 has been casted to a string.
[91mWARNING: [0mUtterance text must be a string: text of utterance with ID: t1_clj3tcp has been casted to a string.
[91mWARNING: [0mUtterance text must be a string: text of utterance with ID: t1_clhtmr0 has been casted to a string.
[91mWARNING: [0mUtterance text must be a string: text of utterance with ID: t1_cler134 has been casted to a string.
[91mWARNING: [0mUtterance text must be a string: text of utterance with ID: t1_claiask has been casted to a string.
[91mWARNING: [0mUtterance text must be a string: text of utterance with ID: t1_cl96qxj has been casted to a string.
[91mWARNING: [0mUtterance text must be a string: text of utterance with ID: t1_cl92hgd has been casted to a string.
[91mWARNING: [0mUtterance text must be a string: text of utterance with ID: t1_cl92jdc has been casted to a string.
[91mWARNING: [0mUtterance text must be a string: text of utterance with ID: t1_cl7lil7 has been casted to a string.
[91mWARNING: [0mUtterance text must be a string: text of utterance with ID: t1_claaeq7 has been casted to a string.
[91mWARNING: [0mUtterance text must be a string: text of utterance with ID: t1_cl6xkrl has been casted to a string.
[91mWARNING: [0mUtterance text must be a string: text of utterance with ID: t1_cl6yu6h has been casted to a string.
[91mWARNING: [0mUtterance text must be a string: text of utterance with ID: t1_cl6ywwj has been casted to a string.
[91mWARNING: [0mUtterance text must be a string: text of utterance with ID: t1_cl6yxf2 has been casted to a string.
[91mWARNING: [0mUtterance text must be a string: text of utterance with ID: t1_cl6u197 has been casted to a string.
[91mWARNING: [0mUtterance text must be a string: text of utterance with ID: t1_cl65sus has been casted to a string.
[91mWARNING: [0mUtterance text must be a string: text of utterance with ID: t1_cl67wux has been casted to a string.
[91mWARNING: [0mUtterance text must be a string: text of utterance with ID: t1_cl5n0e5 has been casted to a string.
[91mWARNING: [0mUtterance text must be a string: text of utterance with ID: t1_cl55jtf has been casted to a string.
[91mWARNING: [0mUtterance text must be a string: text of utterance with ID: t1_cl53psw has been casted to a string.
[91mWARNING: [0mUtterance text must be a string: text of utterance with ID: t1_cl54hde has been casted to a string.
[91mWARNING: [0mUtterance text must be a string: text of utterance with ID: t1_cl5490t has been casted to a string.
[91mWARNING: [0mUtterance text must be a string: text of utterance with ID: t1_cl38tmp has been casted to a string.
[91mWARNING: [0mUtterance text must be a string: text of utterance with ID: t1_cl2rq86 has been casted to a string.
[91mWARNING: [0mUtterance text must be a string: text of utterance with ID: t1_ckujlyq has been casted to a string.
[91mWARNING: [0mUtterance text must be a string: text of utterance with ID: t1_ckujdib has been casted to a string.
[91mWARNING: [0mUtterance text must be a string: text of utterance with ID: t1_cklsm40 has been casted to a string.
[91mWARNING: [0mUtterance text must be a string: text of utterance with ID: t1_ckmghf2 has been casted to a string.
[91mWARNING: [0mUtterance text must be a string: text of utterance with ID: t1_cknmezc has been casted to a string.
[91mWARNING: [0mUtterance text must be a string: text of utterance with ID: t1_ckj2g0i has been casted to a string.
[91mWARNING: [0mUtterance text must be a string: text of utterance with ID: t1_ck8j9e4 has been casted to a string.
[91mWARNING: [0mUtterance text must be a string: text of utterance with ID: t1_ck8cwb6 has been casted to a string.
[91mWARNING: [0mUtterance text must be a string: text of utterance with ID: t1_cjxlz5l has been casted to a string.
[91mWARNING: [0mUtterance text must be a string: text of utterance with ID: t1_cjntq7s has been casted to a string.
[91mWARNING: [0mUtterance text must be a string: text of utterance with ID: t1_cjnp5hy has been casted to a string.
[91mWARNING: [0mUtterance text must be a string: text of utterance with ID: t1_cjiqn1e has been casted to a string.
[91mWARNING: [0mUtterance text must be a string: text of utterance with ID: t1_cjh7hr0 has been casted to a string.
[91mWARNING: [0mUtterance text must be a string: text of utterance with ID: t1_cjh6yab has been casted to a string.
[91mWARNING: [0mUtterance text must be a string: text of utterance with ID: t1_cjcd62j has been casted to a string.
[91mWARNING: [0mUtterance text must be a string: text of utterance with ID: t1_cj8l1kv has been casted to a string.
[91mWARNING: [0mUtterance text must be a string: text of utterance with ID: t1_cj8zjae has been casted to a string.
[91mWARNING: [0mUtterance text must be a string: text of utterance with ID: t1_cj7yiy4 has been casted to a string.
[91mWARNING: [0mUtterance text must be a string: text of utterance with ID: t1_cj7o50t has been casted to a string.
[91mWARNING: [0mUtterance text must be a string: text of utterance with ID: t1_cj50shk has been casted to a string.
[91mWARNING: [0mUtterance text must be a string: text of utterance with ID: t1_cizh6xb has been casted to a string.
[91mWARNING: [0mUtterance text must be a string: text of utterance with ID: t1_cj0ukb7 has been casted to a string.
[91mWARNING: [0mUtterance text must be a string: text of utterance with ID: t1_ciz9aka has been casted to a string.
[91mWARNING: [0mUtterance text must be a string: text of utterance with ID: t1_cize32v has been casted to a string.
[91mWARNING: [0mUtterance text must be a string: text of utterance with ID: t1_civr4mn has been casted to a string.
[91mWARNING: [0mUtterance text must be a string: text of utterance with ID: t1_cip166t has been casted to a string.
[91mWARNING: [0mUtterance text must be a string: text of utterance with ID: t1_cil3zav has been casted to a string.
[91mWARNING: [0mUtterance text must be a string: text of utterance with ID: t1_cijy9lk has been casted to a string.
[91mWARNING: [0mUtterance text must be a string: text of utterance with ID: t1_cijcups has been casted to a string.
[91mWARNING: [0mUtterance text must be a string: text of utterance with ID: t1_cifu4zp has been casted to a string.
[91mWARNING: [0mUtterance text must be a string: text of utterance with ID: t1_cid30or has been casted to a string.
[91mWARNING: [0mUtterance text must be a string: text of utterance with ID: t1_ci8wo4e has been casted to a string.
[91mWARNING: [0mUtterance text must be a string: text of utterance with ID: t1_ci9fbtw has been casted to a string.
[91mWARNING: [0mUtterance text must be a string: text of utterance with ID: t1_ci2pixf has been casted to a string.
[91mWARNING: [0mUtterance text must be a string: text of utterance with ID: t1_ci2pcku has been casted to a string.
[91mWARNING: [0mUtterance text must be a string: text of utterance with ID: t1_ci2cc99 has been casted to a string.
[91mWARNING: [0mUtterance text must be a string: text of utterance with ID: t1_chz0gqi has been casted to a string.
[91mWARNING: [0mUtterance text must be a string: text of utterance with ID: t1_chw609b has been casted to a string.
[91mWARNING: [0mUtterance text must be a string: text of utterance with ID: t1_chv6j3s has been casted to a string.
[91mWARNING: [0mUtterance text must be a string: text of utterance with ID: t1_chv26as has been casted to a string.
[91mWARNING: [0mUtterance text must be a string: text of utterance with ID: t1_churmo0 has been casted to a string.
[91mWARNING: [0mUtterance text must be a string: text of utterance with ID: t1_chswmlq has been casted to a string.
[91mWARNING: [0mUtterance text must be a string: text of utterance with ID: t1_chsxa75 has been casted to a string.
[91mWARNING: [0mUtterance text must be a string: text of utterance with ID: t1_chnlrqw has been casted to a string.
[91mWARNING: [0mUtterance text must be a string: text of utterance with ID: t1_chl8bsy has been casted to a string.
[91mWARNING: [0mUtterance text must be a string: text of utterance with ID: t1_chiw401 has been casted to a string.
[91mWARNING: [0mUtterance text must be a string: text of utterance with ID: t1_chihwy1 has been casted to a string.
[91mWARNING: [0mUtterance text must be a string: text of utterance with ID: t1_chh0sby has been casted to a string.
[91mWARNING: [0mUtterance text must be a string: text of utterance with ID: t1_chbrqn7 has been casted to a string.
[91mWARNING: [0mUtterance text must be a string: text of utterance with ID: t1_ch7rh7p has been casted to a string.
[91mWARNING: [0mUtterance text must be a string: text of utterance with ID: t1_ch6watf has been casted to a string.
[91mWARNING: [0mUtterance text must be a string: text of utterance with ID: t1_ch6ssui has been casted to a string.
[91mWARNING: [0mUtterance text must be a string: text of utterance with ID: t1_ch61v2p has been casted to a string.
[91mWARNING: [0mUtterance text must be a string: text of utterance with ID: t1_ch3ng8c has been casted to a string.
[91mWARNING: [0mUtterance text must be a string: text of utterance with ID: t1_cgrm70r has been casted to a string.
[91mWARNING: [0mUtterance text must be a string: text of utterance with ID: t1_cgrz99m has been casted to a string.
[91mWARNING: [0mUtterance text must be a string: text of utterance with ID: t1_cgsegx6 has been casted to a string.
[91mWARNING: [0mUtterance text must be a string: text of utterance with ID: t1_cgnj6j9 has been casted to a string.
[91mWARNING: [0mUtterance text must be a string: text of utterance with ID: t1_ch073vt has been casted to a string.
[91mWARNING: [0mUtterance text must be a string: text of utterance with ID: t1_cgkn5ae has been casted to a string.
[91mWARNING: [0mUtterance text must be a string: text of utterance with ID: t1_cgiihu5 has been casted to a string.
[91mWARNING: [0mUtterance text must be a string: text of utterance with ID: t1_cgieyuu has been casted to a string.
[91mWARNING: [0mUtterance text must be a string: text of utterance with ID: t1_cgig23g has been casted to a string.
[91mWARNING: [0mUtterance text must be a string: text of utterance with ID: t1_cgie34t has been casted to a string.
[91mWARNING: [0mUtterance text must be a string: text of utterance with ID: t1_cgh4ifl has been casted to a string.
[91mWARNING: [0mUtterance text must be a string: text of utterance with ID: t1_cggr63w has been casted to a string.
[91mWARNING: [0mUtterance text must be a string: text of utterance with ID: t1_cggd3v7 has been casted to a string.
[91mWARNING: [0mUtterance text must be a string: text of utterance with ID: t1_cggutk5 has been casted to a string.
[91mWARNING: [0mUtterance text must be a string: text of utterance with ID: t1_cgeiql6 has been casted to a string.
[91mWARNING: [0mUtterance text must be a string: text of utterance with ID: t1_cgej6dy has been casted to a string.
[91mWARNING: [0mUtterance text must be a string: text of utterance with ID: t1_cge3th4 has been casted to a string.
[91mWARNING: [0mUtterance text must be a string: text of utterance with ID: t1_cgedp3m has been casted to a string.
[91mWARNING: [0mUtterance text must be a string: text of utterance with ID: t1_cg8kgbj has been casted to a string.
[91mWARNING: [0mUtterance text must be a string: text of utterance with ID: t1_cfwc45l has been casted to a string.
[91mWARNING: [0mUtterance text must be a string: text of utterance with ID: t1_cfv3enu has been casted to a string.
[91mWARNING: [0mUtterance text must be a string: text of utterance with ID: t1_cfpx424 has been casted to a string.
[91mWARNING: [0mUtterance text must be a string: text of utterance with ID: t1_cfl78p1 has been casted to a string.
[91mWARNING: [0mUtterance text must be a string: text of utterance with ID: t1_cfilsz7 has been casted to a string.
[91mWARNING: [0mUtterance text must be a string: text of utterance with ID: t1_cfj8wvg has been casted to a string.
[91mWARNING: [0mUtterance text must be a string: text of utterance with ID: t1_cffznn5 has been casted to a string.
[91mWARNING: [0mUtterance text must be a string: text of utterance with ID: t1_cffo8l7 has been casted to a string.
[91mWARNING: [0mUtterance text must be a string: text of utterance with ID: t1_cfdq2z7 has been casted to a string.
[91mWARNING: [0mUtterance text must be a string: text of utterance with ID: t1_cfc9c4b has been casted to a string.
[91mWARNING: [0mUtterance text must be a string: text of utterance with ID: t1_cfcrvtf has been casted to a string.
[91mWARNING: [0mUtterance text must be a string: text of utterance with ID: t1_cfbxgor has been casted to a string.
[91mWARNING: [0mUtterance text must be a string: text of utterance with ID: t1_cfb4q61 has been casted to a string.
[91mWARNING: [0mUtterance text must be a string: text of utterance with ID: t1_cfb5l7x has been casted to a string.
[91mWARNING: [0mUtterance text must be a string: text of utterance with ID: t1_cf99z8w has been casted to a string.
[91mWARNING: [0mUtterance text must be a string: text of utterance with ID: t1_cf2h04o has been casted to a string.
[91mWARNING: [0mUtterance text must be a string: text of utterance with ID: t1_ceptv5v has been casted to a string.
[91mWARNING: [0mUtterance text must be a string: text of utterance with ID: t1_ceo2hjj has been casted to a string.
[91mWARNING: [0mUtterance text must be a string: text of utterance with ID: t1_ceorou4 has been casted to a string.
[91mWARNING: [0mUtterance text must be a string: text of utterance with ID: t1_cenyp1x has been casted to a string.
[91mWARNING: [0mUtterance text must be a string: text of utterance with ID: t1_cemye2y has been casted to a string.
[91mWARNING: [0mUtterance text must be a string: text of utterance with ID: t1_celsimn has been casted to a string.
[91mWARNING: [0mUtterance text must be a string: text of utterance with ID: t1_ceehosv has been casted to a string.
[91mWARNING: [0mUtterance text must be a string: text of utterance with ID: t1_cebrznc has been casted to a string.
[91mWARNING: [0mUtterance text must be a string: text of utterance with ID: t1_ceconce has been casted to a string.
[91mWARNING: [0mUtterance text must be a string: text of utterance with ID: t1_cebnxzk has been casted to a string.
[91mWARNING: [0mUtterance text must be a string: text of utterance with ID: t1_cebd6e3 has been casted to a string.
[91mWARNING: [0mUtterance text must be a string: text of utterance with ID: t1_cec3nwt has been casted to a string.
[91mWARNING: [0mUtterance text must be a string: text of utterance with ID: t1_chv7nyk has been casted to a string.
[91mWARNING: [0mUtterance text must be a string: text of utterance with ID: t1_cedktpa has been casted to a string.
[91mWARNING: [0mUtterance text must be a string: text of utterance with ID: t1_ce9dob3 has been casted to a string.
[91mWARNING: [0mUtterance text must be a string: text of utterance with ID: t1_ce5q8tj has been casted to a string.
[91mWARNING: [0mUtterance text must be a string: text of utterance with ID: t1_ce4gt8u has been casted to a string.
[91mWARNING: [0mUtterance text must be a string: text of utterance with ID: t1_ce0p904 has been casted to a string.
[91mWARNING: [0mUtterance text must be a string: text of utterance with ID: t1_cdosehh has been casted to a string.
[91mWARNING: [0mUtterance text must be a string: text of utterance with ID: t1_cdiarjx has been casted to a string.
[91mWARNING: [0mUtterance text must be a string: text of utterance with ID: t1_cda83lr has been casted to a string.
[91mWARNING: [0mUtterance text must be a string: text of utterance with ID: t1_cd759lq has been casted to a string.
[91mWARNING: [0mUtterance text must be a string: text of utterance with ID: t1_cd4pzb7 has been casted to a string.
[91mWARNING: [0mUtterance text must be a string: text of utterance with ID: t1_cd4p7s1 has been casted to a string.
[91mWARNING: [0mUtterance text must be a string: text of utterance with ID: t1_cd4zoy7 has been casted to a string.
[91mWARNING: [0mUtterance text must be a string: text of utterance with ID: t1_cdn4qk9 has been casted to a string.
[91mWARNING: [0mUtterance text must be a string: text of utterance with ID: t1_ccue3mg has been casted to a string.
[91mWARNING: [0mUtterance text must be a string: text of utterance with ID: t1_ccphf6t has been casted to a string.
[91mWARNING: [0mUtterance text must be a string: text of utterance with ID: t1_ccphsld has been casted to a string.
[91mWARNING: [0mUtterance text must be a string: text of utterance with ID: t1_ccmc12f has been casted to a string.
[91mWARNING: [0mUtterance text must be a string: text of utterance with ID: t1_ccjdtdf has been casted to a string.
[91mWARNING: [0mUtterance text must be a string: text of utterance with ID: t1_cciu2c8 has been casted to a string.
[91mWARNING: [0mUtterance text must be a string: text of utterance with ID: t1_ccizwjs has been casted to a string.
[91mWARNING: [0mUtterance text must be a string: text of utterance with ID: t1_ccivs3y has been casted to a string.
[91mWARNING: [0mUtterance text must be a string: text of utterance with ID: t1_ccj0iqr has been casted to a string.
[91mWARNING: [0mUtterance text must be a string: text of utterance with ID: t1_cciupg2 has been casted to a string.
[91mWARNING: [0mUtterance text must be a string: text of utterance with ID: t1_cch8ibq has been casted to a string.
[91mWARNING: [0mUtterance text must be a string: text of utterance with ID: t1_ccgulma has been casted to a string.
[91mWARNING: [0mUtterance text must be a string: text of utterance with ID: t1_ccgulwv has been casted to a string.
[91mWARNING: [0mUtterance text must be a string: text of utterance with ID: t1_ccgnj8i has been casted to a string.
[91mWARNING: [0mUtterance text must be a string: text of utterance with ID: t1_cch2tad has been casted to a string.
[91mWARNING: [0mUtterance text must be a string: text of utterance with ID: t1_cch9i4l has been casted to a string.
[91mWARNING: [0mUtterance text must be a string: text of utterance with ID: t1_cch9wwe has been casted to a string.
[91mWARNING: [0mUtterance text must be a string: text of utterance with ID: t1_ccgvmn6 has been casted to a string.
[91mWARNING: [0mUtterance text must be a string: text of utterance with ID: t1_cce35j0 has been casted to a string.
[91mWARNING: [0mUtterance text must be a string: text of utterance with ID: t1_ccdamor has been casted to a string.
[91mWARNING: [0mUtterance text must be a string: text of utterance with ID: t1_cce8w7z has been casted to a string.
[91mWARNING: [0mUtterance text must be a string: text of utterance with ID: t1_cce42qt has been casted to a string.
[91mWARNING: [0mUtterance text must be a string: text of utterance with ID: t1_cc9x38t has been casted to a string.
[91mWARNING: [0mUtterance text must be a string: text of utterance with ID: t1_cc302nb has been casted to a string.
[91mWARNING: [0mUtterance text must be a string: text of utterance with ID: t1_cbyfuze has been casted to a string.
[91mWARNING: [0mUtterance text must be a string: text of utterance with ID: t1_cbvq8lh has been casted to a string.
[91mWARNING: [0mUtterance text must be a string: text of utterance with ID: t1_cbrec2x has been casted to a string.
[91mWARNING: [0mUtterance text must be a string: text of utterance with ID: t1_cbrmg6l has been casted to a string.
[91mWARNING: [0mUtterance text must be a string: text of utterance with ID: t1_cbq8ij7 has been casted to a string.
[91mWARNING: [0mUtterance text must be a string: text of utterance with ID: t1_cbp1a4b has been casted to a string.
[91mWARNING: [0mUtterance text must be a string: text of utterance with ID: t1_cboztg2 has been casted to a string.
[91mWARNING: [0mUtterance text must be a string: text of utterance with ID: t1_cbou2b5 has been casted to a string.
[91mWARNING: [0mUtterance text must be a string: text of utterance with ID: t1_cbpau3m has been casted to a string.
[91mWARNING: [0mUtterance text must be a string: text of utterance with ID: t1_cbokj1o has been casted to a string.
[91mWARNING: [0mUtterance text must be a string: text of utterance with ID: t1_cboz1er has been casted to a string.
[91mWARNING: [0mUtterance text must be a string: text of utterance with ID: t1_cbmr775 has been casted to a string.
[91mWARNING: [0mUtterance text must be a string: text of utterance with ID: t1_cblsags has been casted to a string.
[91mWARNING: [0mUtterance text must be a string: text of utterance with ID: t1_cbqpqwi has been casted to a string.
[91mWARNING: [0mUtterance text must be a string: text of utterance with ID: t1_cb9lo66 has been casted to a string.
[91mWARNING: [0mUtterance text must be a string: text of utterance with ID: t1_cb9lrsq has been casted to a string.
[91mWARNING: [0mUtterance text must be a string: text of utterance with ID: t1_cb9zx65 has been casted to a string.
[91mWARNING: [0mUtterance text must be a string: text of utterance with ID: t1_cb8beut has been casted to a string.
[91mWARNING: [0mUtterance text must be a string: text of utterance with ID: t1_cb7zk0m has been casted to a string.
[91mWARNING: [0mUtterance text must be a string: text of utterance with ID: t1_cb6c4sq has been casted to a string.
[91mWARNING: [0mUtterance text must be a string: text of utterance with ID: t1_cb33ath has been casted to a string.
[91mWARNING: [0mUtterance text must be a string: text of utterance with ID: t1_cb1kfhi has been casted to a string.
[91mWARNING: [0mUtterance text must be a string: text of utterance with ID: t1_cb0ahbi has been casted to a string.
[91mWARNING: [0mUtterance text must be a string: text of utterance with ID: t1_cb0bzr4 has been casted to a string.
[91mWARNING: [0mUtterance text must be a string: text of utterance with ID: t1_cawc0b9 has been casted to a string.
[91mWARNING: [0mUtterance text must be a string: text of utterance with ID: t1_cawjc4y has been casted to a string.
[91mWARNING: [0mUtterance text must be a string: text of utterance with ID: t1_caxt2vf has been casted to a string.
[91mWARNING: [0mUtterance text must be a string: text of utterance with ID: t1_caukmyp has been casted to a string.
[91mWARNING: [0mUtterance text must be a string: text of utterance with ID: t1_caurnnc has been casted to a string.
[91mWARNING: [0mUtterance text must be a string: text of utterance with ID: t1_cavlgpq has been casted to a string.
[91mWARNING: [0mUtterance text must be a string: text of utterance with ID: t1_caulblj has been casted to a string.
[91mWARNING: [0mUtterance text must be a string: text of utterance with ID: t1_cauridr has been casted to a string.
[91mWARNING: [0mUtterance text must be a string: text of utterance with ID: t1_casyucd has been casted to a string.
[91mWARNING: [0mUtterance text must be a string: text of utterance with ID: t1_catl342 has been casted to a string.
[91mWARNING: [0mUtterance text must be a string: text of utterance with ID: t1_camzwda has been casted to a string.
[91mWARNING: [0mUtterance text must be a string: text of utterance with ID: t1_camssul has been casted to a string.
[91mWARNING: [0mUtterance text must be a string: text of utterance with ID: t1_can6u3l has been casted to a string.
[91mWARNING: [0mUtterance text must be a string: text of utterance with ID: t1_calik3u has been casted to a string.
[91mWARNING: [0mUtterance text must be a string: text of utterance with ID: t1_caljueg has been casted to a string.
[91mWARNING: [0mUtterance text must be a string: text of utterance with ID: t1_cajb0py has been casted to a string.
[91mWARNING: [0mUtterance text must be a string: text of utterance with ID: t1_cajbvqf has been casted to a string.
[91mWARNING: [0mUtterance text must be a string: text of utterance with ID: t1_caajlk6 has been casted to a string.
[91mWARNING: [0mUtterance text must be a string: text of utterance with ID: t1_caachmu has been casted to a string.
[91mWARNING: [0mUtterance text must be a string: text of utterance with ID: t1_cabc5l7 has been casted to a string.
[91mWARNING: [0mUtterance text must be a string: text of utterance with ID: t1_c9xkwd1 has been casted to a string.
[91mWARNING: [0mUtterance text must be a string: text of utterance with ID: t1_c9za8qk has been casted to a string.
[91mWARNING: [0mUtterance text must be a string: text of utterance with ID: t1_c9x7xos has been casted to a string.
[91mWARNING: [0mUtterance text must be a string: text of utterance with ID: t1_c9zr42i has been casted to a string.
[91mWARNING: [0mUtterance text must be a string: text of utterance with ID: t1_c9tlmhq has been casted to a string.
[91mWARNING: [0mUtterance text must be a string: text of utterance with ID: t1_c9r6a05 has been casted to a string.
[91mWARNING: [0mUtterance text must be a string: text of utterance with ID: t1_ca0tpgp has been casted to a string.
[91mWARNING: [0mUtterance text must be a string: text of utterance with ID: t1_c9r7py2 has been casted to a string.
[91mWARNING: [0mUtterance text must be a string: text of utterance with ID: t1_c9rnobm has been casted to a string.
[91mWARNING: [0mUtterance text must be a string: text of utterance with ID: t1_c9rp0so has been casted to a string.
[91mWARNING: [0mUtterance text must be a string: text of utterance with ID: t1_c9qubp4 has been casted to a string.
[91mWARNING: [0mUtterance text must be a string: text of utterance with ID: t1_c9qwaar has been casted to a string.
[91mWARNING: [0mUtterance text must be a string: text of utterance with ID: t1_c9pfs2o has been casted to a string.
[91mWARNING: [0mUtterance text must be a string: text of utterance with ID: t1_c9lc68q has been casted to a string.
[91mWARNING: [0mUtterance text must be a string: text of utterance with ID: t1_c9jlurv has been casted to a string.
[91mWARNING: [0mUtterance text must be a string: text of utterance with ID: t1_c9hall9 has been casted to a string.
[91mWARNING: [0mUtterance text must be a string: text of utterance with ID: t1_c9gkb83 has been casted to a string.
[91mWARNING: [0mUtterance text must be a string: text of utterance with ID: t1_c9bvjlq has been casted to a string.
[91mWARNING: [0mUtterance text must be a string: text of utterance with ID: t1_c98s9ip has been casted to a string.
[91mWARNING: [0mUtterance text must be a string: text of utterance with ID: t1_c98o57y has been casted to a string.
[91mWARNING: [0mUtterance text must be a string: text of utterance with ID: t1_c99ahf3 has been casted to a string.
[91mWARNING: [0mUtterance text must be a string: text of utterance with ID: t1_c990rx6 has been casted to a string.
[91mWARNING: [0mUtterance text must be a string: text of utterance with ID: t1_c97eoi9 has been casted to a string.
[91mWARNING: [0mUtterance text must be a string: text of utterance with ID: t1_c97acob has been casted to a string.
[91mWARNING: [0mUtterance text must be a string: text of utterance with ID: t1_c97ah5c has been casted to a string.
[91mWARNING: [0mUtterance text must be a string: text of utterance with ID: t1_c95q7s3 has been casted to a string.
[91mWARNING: [0mUtterance text must be a string: text of utterance with ID: t1_c95kdch has been casted to a string.
[91mWARNING: [0mUtterance text must be a string: text of utterance with ID: t1_c95l9ml has been casted to a string.
[91mWARNING: [0mUtterance text must be a string: text of utterance with ID: t1_cqzqi7q has been casted to a string.
[91mWARNING: [0mUtterance text must be a string: text of utterance with ID: t1_cqzmedm has been casted to a string.
[91mWARNING: [0mUtterance text must be a string: text of utterance with ID: t1_cr13cnd has been casted to a string.
[91mWARNING: [0mUtterance text must be a string: text of utterance with ID: t1_cqzr1lp has been casted to a string.
[91mWARNING: [0mUtterance text must be a string: text of utterance with ID: t1_cqyom18 has been casted to a string.
[91mWARNING: [0mUtterance text must be a string: text of utterance with ID: t1_cqybmxe has been casted to a string.
[91mWARNING: [0mUtterance text must be a string: text of utterance with ID: t1_cqy7ahu has been casted to a string.
[91mWARNING: [0mUtterance text must be a string: text of utterance with ID: t1_cqybfct has been casted to a string.
[91mWARNING: [0mUtterance text must be a string: text of utterance with ID: t1_cqyuen2 has been casted to a string.
[91mWARNING: [0mUtterance text must be a string: text of utterance with ID: t1_cqy83y1 has been casted to a string.
[91mWARNING: [0mUtterance text must be a string: text of utterance with ID: t1_cqxoxi5 has been casted to a string.
[91mWARNING: [0mUtterance text must be a string: text of utterance with ID: t1_cqn9nyq has been casted to a string.
[91mWARNING: [0mUtterance text must be a string: text of utterance with ID: t1_cqnhdk2 has been casted to a string.
[91mWARNING: [0mUtterance text must be a string: text of utterance with ID: t1_cqn3mch has been casted to a string.
[91mWARNING: [0mUtterance text must be a string: text of utterance with ID: t1_cqn7pac has been casted to a string.
[91mWARNING: [0mUtterance text must be a string: text of utterance with ID: t1_cqmw3di has been casted to a string.
[91mWARNING: [0mUtterance text must be a string: text of utterance with ID: t1_cqf8ryl has been casted to a string.
[91mWARNING: [0mUtterance text must be a string: text of utterance with ID: t1_cqf8mpf has been casted to a string.
[91mWARNING: [0mUtterance text must be a string: text of utterance with ID: t1_cqfb5xa has been casted to a string.
[91mWARNING: [0mUtterance text must be a string: text of utterance with ID: t1_cqfhjkq has been casted to a string.
[91mWARNING: [0mUtterance text must be a string: text of utterance with ID: t1_cqgarpy has been casted to a string.
[91mWARNING: [0mUtterance text must be a string: text of utterance with ID: t1_cqdccrd has been casted to a string.
[91mWARNING: [0mUtterance text must be a string: text of utterance with ID: t1_cqctlqr has been casted to a string.
[91mWARNING: [0mUtterance text must be a string: text of utterance with ID: t1_cqcthds has been casted to a string.
[91mWARNING: [0mUtterance text must be a string: text of utterance with ID: t1_cqkelp2 has been casted to a string.
[91mWARNING: [0mUtterance text must be a string: text of utterance with ID: t1_cqd8j90 has been casted to a string.
[91mWARNING: [0mUtterance text must be a string: text of utterance with ID: t1_cqd8a8j has been casted to a string.
[91mWARNING: [0mUtterance text must be a string: text of utterance with ID: t1_cqcrdrw has been casted to a string.
[91mWARNING: [0mUtterance text must be a string: text of utterance with ID: t1_cq8xz69 has been casted to a string.
[91mWARNING: [0mUtterance text must be a string: text of utterance with ID: t1_cq97x51 has been casted to a string.
[91mWARNING: [0mUtterance text must be a string: text of utterance with ID: t1_cq8zb08 has been casted to a string.
[91mWARNING: [0mUtterance text must be a string: text of utterance with ID: t1_cq959on has been casted to a string.
[91mWARNING: [0mUtterance text must be a string: text of utterance with ID: t1_cq9ci0g has been casted to a string.
[91mWARNING: [0mUtterance text must be a string: text of utterance with ID: t1_cq97u6v has been casted to a string.
[91mWARNING: [0mUtterance text must be a string: text of utterance with ID: t1_cptu8ww has been casted to a string.
[91mWARNING: [0mUtterance text must be a string: text of utterance with ID: t1_cpstv77 has been casted to a string.
[91mWARNING: [0mUtterance text must be a string: text of utterance with ID: t1_cpsxhki has been casted to a string.
[91mWARNING: [0mUtterance text must be a string: text of utterance with ID: t1_cpswzfn has been casted to a string.
[91mWARNING: [0mUtterance text must be a string: text of utterance with ID: t1_cpuguvq has been casted to a string.
[91mWARNING: [0mUtterance text must be a string: text of utterance with ID: t1_cpvfq3o has been casted to a string.
[91mWARNING: [0mUtterance text must be a string: text of utterance with ID: t1_cptriew has been casted to a string.
[91mWARNING: [0mUtterance text must be a string: text of utterance with ID: t1_cpsxudr has been casted to a string.
[91mWARNING: [0mUtterance text must be a string: text of utterance with ID: t1_cpszfeh has been casted to a string.
[91mWARNING: [0mUtterance text must be a string: text of utterance with ID: t1_cpqft77 has been casted to a string.
[91mWARNING: [0mUtterance text must be a string: text of utterance with ID: t1_cpqfs0u has been casted to a string.
[91mWARNING: [0mUtterance text must be a string: text of utterance with ID: t1_cprk62o has been casted to a string.
[91mWARNING: [0mUtterance text must be a string: text of utterance with ID: t1_cpnulph has been casted to a string.
[91mWARNING: [0mUtterance text must be a string: text of utterance with ID: t1_cpnn77o has been casted to a string.
[91mWARNING: [0mUtterance text must be a string: text of utterance with ID: t1_cpnju02 has been casted to a string.
[91mWARNING: [0mUtterance text must be a string: text of utterance with ID: t1_cpn5p75 has been casted to a string.
[91mWARNING: [0mUtterance text must be a string: text of utterance with ID: t1_cpni19v has been casted to a string.
[91mWARNING: [0mUtterance text must be a string: text of utterance with ID: t1_cpo4x6m has been casted to a string.
[91mWARNING: [0mUtterance text must be a string: text of utterance with ID: t1_cpbijg5 has been casted to a string.
[91mWARNING: [0mUtterance text must be a string: text of utterance with ID: t1_cpbfv95 has been casted to a string.
[91mWARNING: [0mUtterance text must be a string: text of utterance with ID: t1_coz6meu has been casted to a string.
[91mWARNING: [0mUtterance text must be a string: text of utterance with ID: t1_coz24er has been casted to a string.
[91mWARNING: [0mUtterance text must be a string: text of utterance with ID: t1_coz3omg has been casted to a string.
[91mWARNING: [0mUtterance text must be a string: text of utterance with ID: t1_copv0tz has been casted to a string.
[91mWARNING: [0mUtterance text must be a string: text of utterance with ID: t1_cor1wk2 has been casted to a string.
[91mWARNING: [0mUtterance text must be a string: text of utterance with ID: t1_coq0wpg has been casted to a string.
[91mWARNING: [0mUtterance text must be a string: text of utterance with ID: t1_coqe7nl has been casted to a string.
[91mWARNING: [0mUtterance text must be a string: text of utterance with ID: t1_cokywqo has been casted to a string.
[91mWARNING: [0mUtterance text must be a string: text of utterance with ID: t1_cokwcae has been casted to a string.
[91mWARNING: [0mUtterance text must be a string: text of utterance with ID: t1_cokx8ik has been casted to a string.
[91mWARNING: [0mUtterance text must be a string: text of utterance with ID: t1_cokyzdf has been casted to a string.
[91mWARNING: [0mUtterance text must be a string: text of utterance with ID: t1_col2sjk has been casted to a string.
[91mWARNING: [0mUtterance text must be a string: text of utterance with ID: t1_cokxscn has been casted to a string.
[91mWARNING: [0mUtterance text must be a string: text of utterance with ID: t1_cojw16m has been casted to a string.
[91mWARNING: [0mUtterance text must be a string: text of utterance with ID: t1_coezsna has been casted to a string.
[91mWARNING: [0mUtterance text must be a string: text of utterance with ID: t1_cof1lb9 has been casted to a string.
[91mWARNING: [0mUtterance text must be a string: text of utterance with ID: t1_cohpfyt has been casted to a string.
[91mWARNING: [0mUtterance text must be a string: text of utterance with ID: t1_cof4o1h has been casted to a string.
[91mWARNING: [0mUtterance text must be a string: text of utterance with ID: t1_coeyzd4 has been casted to a string.
[91mWARNING: [0mUtterance text must be a string: text of utterance with ID: t1_coai23q has been casted to a string.
[91mWARNING: [0mUtterance text must be a string: text of utterance with ID: t1_coa8l6z has been casted to a string.
[91mWARNING: [0mUtterance text must be a string: text of utterance with ID: t1_coa80dz has been casted to a string.
[91mWARNING: [0mUtterance text must be a string: text of utterance with ID: t1_coa8l84 has been casted to a string.
[91mWARNING: [0mUtterance text must be a string: text of utterance with ID: t1_co68zqm has been casted to a string.
[91mWARNING: [0mUtterance text must be a string: text of utterance with ID: t1_co6a4yb has been casted to a string.
[91mWARNING: [0mUtterance text must be a string: text of utterance with ID: t1_co6ef8m has been casted to a string.
[91mWARNING: [0mUtterance text must be a string: text of utterance with ID: t1_co6ba44 has been casted to a string.
[91mWARNING: [0mUtterance text must be a string: text of utterance with ID: t1_co6rpqr has been casted to a string.
[91mWARNING: [0mUtterance text must be a string: text of utterance with ID: t1_co69x71 has been casted to a string.
[91mWARNING: [0mUtterance text must be a string: text of utterance with ID: t1_co6a13x has been casted to a string.
[91mWARNING: [0mUtterance text must be a string: text of utterance with ID: t1_cnq3mwr has been casted to a string.
[91mWARNING: [0mUtterance text must be a string: text of utterance with ID: t1_cnq4ycs has been casted to a string.
[91mWARNING: [0mUtterance text must be a string: text of utterance with ID: t1_cnqs4bp has been casted to a string.
[91mWARNING: [0mUtterance text must be a string: text of utterance with ID: t1_cunpy0j has been casted to a string.
[91mWARNING: [0mUtterance text must be a string: text of utterance with ID: t1_cuior6e has been casted to a string.
[91mWARNING: [0mUtterance text must be a string: text of utterance with ID: t1_cufvp2u has been casted to a string.
[91mWARNING: [0mUtterance text must be a string: text of utterance with ID: t1_cu7mdqo has been casted to a string.
[91mWARNING: [0mUtterance text must be a string: text of utterance with ID: t1_cu7xcz4 has been casted to a string.
[91mWARNING: [0mUtterance text must be a string: text of utterance with ID: t1_cu5cv9a has been casted to a string.
[91mWARNING: [0mUtterance text must be a string: text of utterance with ID: t1_cu3h1wa has been casted to a string.
[91mWARNING: [0mUtterance text must be a string: text of utterance with ID: t1_cu272pa has been casted to a string.
[91mWARNING: [0mUtterance text must be a string: text of utterance with ID: t1_cu0qopb has been casted to a string.
[91mWARNING: [0mUtterance text must be a string: text of utterance with ID: t1_ctzro35 has been casted to a string.
[91mWARNING: [0mUtterance text must be a string: text of utterance with ID: t1_ctxzpb4 has been casted to a string.
[91mWARNING: [0mUtterance text must be a string: text of utterance with ID: t1_ctwh70b has been casted to a string.
[91mWARNING: [0mUtterance text must be a string: text of utterance with ID: t1_ctqswb6 has been casted to a string.
[91mWARNING: [0mUtterance text must be a string: text of utterance with ID: t1_ctqjqcd has been casted to a string.
[91mWARNING: [0mUtterance text must be a string: text of utterance with ID: t1_ctoncx7 has been casted to a string.
[91mWARNING: [0mUtterance text must be a string: text of utterance with ID: t1_ctq5xed has been casted to a string.
[91mWARNING: [0mUtterance text must be a string: text of utterance with ID: t1_ctjjz2v has been casted to a string.
[91mWARNING: [0mUtterance text must be a string: text of utterance with ID: t1_cti68mr has been casted to a string.
[91mWARNING: [0mUtterance text must be a string: text of utterance with ID: t1_ctiqywc has been casted to a string.
[91mWARNING: [0mUtterance text must be a string: text of utterance with ID: t1_cthu9hx has been casted to a string.
[91mWARNING: [0mUtterance text must be a string: text of utterance with ID: t1_ctdqpne has been casted to a string.
[91mWARNING: [0mUtterance text must be a string: text of utterance with ID: t1_ctdkusf has been casted to a string.
[91mWARNING: [0mUtterance text must be a string: text of utterance with ID: t1_ctdhx1w has been casted to a string.
[91mWARNING: [0mUtterance text must be a string: text of utterance with ID: t1_ctcfj2v has been casted to a string.
[91mWARNING: [0mUtterance text must be a string: text of utterance with ID: t1_ct5ul8m has been casted to a string.
[91mWARNING: [0mUtterance text must be a string: text of utterance with ID: t1_ct5m7v4 has been casted to a string.
[91mWARNING: [0mUtterance text must be a string: text of utterance with ID: t1_ct4klhi has been casted to a string.
[91mWARNING: [0mUtterance text must be a string: text of utterance with ID: t1_ct3sgfd has been casted to a string.
[91mWARNING: [0mUtterance text must be a string: text of utterance with ID: t1_ct5bruu has been casted to a string.
[91mWARNING: [0mUtterance text must be a string: text of utterance with ID: t1_ct1rmby has been casted to a string.
[91mWARNING: [0mUtterance text must be a string: text of utterance with ID: t1_cstl2de has been casted to a string.
[91mWARNING: [0mUtterance text must be a string: text of utterance with ID: t1_cstboz7 has been casted to a string.
[91mWARNING: [0mUtterance text must be a string: text of utterance with ID: t1_cstbdw5 has been casted to a string.
[91mWARNING: [0mUtterance text must be a string: text of utterance with ID: t1_ct7dtjh has been casted to a string.
[91mWARNING: [0mUtterance text must be a string: text of utterance with ID: t1_csk5kbs has been casted to a string.
[91mWARNING: [0mUtterance text must be a string: text of utterance with ID: t1_ctam9yz has been casted to a string.
[91mWARNING: [0mUtterance text must be a string: text of utterance with ID: t1_csh3dqw has been casted to a string.
[91mWARNING: [0mUtterance text must be a string: text of utterance with ID: t1_csislrs has been casted to a string.
[91mWARNING: [0mUtterance text must be a string: text of utterance with ID: t1_csed3sw has been casted to a string.
[91mWARNING: [0mUtterance text must be a string: text of utterance with ID: t1_ctalrap has been casted to a string.
[91mWARNING: [0mUtterance text must be a string: text of utterance with ID: t1_csernqu has been casted to a string.
[91mWARNING: [0mUtterance text must be a string: text of utterance with ID: t1_csdn1ui has been casted to a string.
[91mWARNING: [0mUtterance text must be a string: text of utterance with ID: t1_cs8mmtf has been casted to a string.
[91mWARNING: [0mUtterance text must be a string: text of utterance with ID: t1_cs4re28 has been casted to a string.
[91mWARNING: [0mUtterance text must be a string: text of utterance with ID: t1_ctb2nf1 has been casted to a string.
[91mWARNING: [0mUtterance text must be a string: text of utterance with ID: t1_cs1tl9q has been casted to a string.
[91mWARNING: [0mUtterance text must be a string: text of utterance with ID: t1_crul067 has been casted to a string.
[91mWARNING: [0mUtterance text must be a string: text of utterance with ID: t1_crusrpd has been casted to a string.
[91mWARNING: [0mUtterance text must be a string: text of utterance with ID: t1_crjkijn has been casted to a string.
[91mWARNING: [0mUtterance text must be a string: text of utterance with ID: t1_cre72f1 has been casted to a string.
[91mWARNING: [0mUtterance text must be a string: text of utterance with ID: t1_creewvt has been casted to a string.
[91mWARNING: [0mUtterance text must be a string: text of utterance with ID: t1_creh7rb has been casted to a string.
[91mWARNING: [0mUtterance text must be a string: text of utterance with ID: t1_crctrfy has been casted to a string.
[91mWARNING: [0mUtterance text must be a string: text of utterance with ID: t1_crbkihy has been casted to a string.
[91mWARNING: [0mUtterance text must be a string: text of utterance with ID: t1_cra0ei4 has been casted to a string.
Encountered erroneus tree! Skipping..
Encountered erroneus tree! Skipping..
Encountered erroneus tree! Skipping..
Encountered erroneus tree! Skipping..
Number of threads in dataset: 120031
--------------Restarting: Beginning EPOCH 5-------------
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Test Accuracy: 0.9555635405437777
Loss: tensor(0.0147, device='cuda:0', grad_fn=<NllLossBackward>)
Saving model at iteration: 0
Loss: tensor(0.0093, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0085, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0126, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0128, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0223, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0091, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0173, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0214, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0085, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0123, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0099, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0143, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0100, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0154, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0119, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0152, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0150, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0159, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0147, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0199, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0103, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0189, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0109, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0120, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0203, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0165, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0130, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0080, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0129, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0140, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0089, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0164, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0209, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0138, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0226, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0082, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0112, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0185, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0147, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0208, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0134, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0201, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0088, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0087, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0239, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0146, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0211, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0099, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0202, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0023, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0146, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0102, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0233, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0156, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0112, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0136, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0024, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0141, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0092, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0136, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0179, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0112, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0008, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0133, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0007, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0007, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0123, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0247, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0131, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0115, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0178, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0136, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0007, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0174, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0111, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0152, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0006, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0088, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0156, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0019, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0107, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0168, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0133, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0148, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0019, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0006, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0006, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0018, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0137, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0080, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0109, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0116, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0018, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0018, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0103, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0123, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0005, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0013, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0005, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0016, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0101, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0099, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0085, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0005, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0136, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0088, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0172, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0081, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0084, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0083, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0159, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0145, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0014, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0104, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0114, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0022, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0005, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0005, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0110, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0083, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0170, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0204, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0022, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0014, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0005, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0005, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0005, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0005, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0086, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0165, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0172, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0086, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0138, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0097, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0005, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0124, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0012, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0090, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0004, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0216, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0004, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0004, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0012, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0117, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0012, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0136, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0174, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0198, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0106, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0120, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0224, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0212, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0179, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0105, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0149, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0116, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0088, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0022, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0020, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0004, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0255, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0127, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0123, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0129, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0004, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0167, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0210, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0169, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0099, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0106, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0102, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0097, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0145, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0115, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0129, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0158, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0180, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0103, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0163, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0089, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0179, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0112, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0126, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0116, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0136, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0162, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0099, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0143, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0087, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0088, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0096, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0081, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0137, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0124, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0165, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0114, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0123, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0150, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0235, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0086, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0194, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0119, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0086, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0105, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0131, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0200, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0104, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0097, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0082, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0019, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0190, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0135, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0158, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0163, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0116, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0134, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0221, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0117, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0018, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0159, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0131, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0124, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0093, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0083, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0140, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0096, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0209, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0023, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0174, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0132, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0114, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0107, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0171, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0120, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0148, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0133, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0110, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0107, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0099, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0199, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0130, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0023, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0129, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0190, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0097, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0173, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0236, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0104, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0119, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0186, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0183, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0173, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0103, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0096, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0114, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0106, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0089, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0164, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0018, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0111, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0080, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0197, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0130, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0142, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0152, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0178, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0177, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0111, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0103, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0090, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0087, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0133, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0137, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0194, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0090, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0102, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0205, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0105, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0112, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0164, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0021, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0122, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0115, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0111, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0104, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0081, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0160, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0179, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0124, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0102, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0091, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0188, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0168, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0083, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0158, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0108, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0233, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0082, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0199, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0145, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0119, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0097, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0101, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0128, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0213, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0119, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0141, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0182, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0104, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0090, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0127, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0110, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0137, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0195, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0163, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0091, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0142, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0239, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0105, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0128, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0218, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0222, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0099, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0094, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0116, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0099, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0109, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0105, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0121, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0128, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0084, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0009, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0092, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0108, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0092, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0093, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0088, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0152, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0086, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0100, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0101, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0084, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0152, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0140, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0145, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0085, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0091, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0090, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0159, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0140, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0082, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0098, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0084, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0020, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0154, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0113, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0102, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0138, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0149, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0096, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0132, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0111, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0142, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0097, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0084, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0094, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0080, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0106, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0122, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0136, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0080, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0121, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0105, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0083, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0185, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0253, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0144, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0094, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0133, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0139, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0005, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0106, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0018, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0108, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0137, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0088, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0110, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0156, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0161, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0186, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0016, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0021, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0132, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0146, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0106, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0152, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0019, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0161, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0191, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0116, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0139, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0180, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0110, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0144, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0102, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0107, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0110, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0119, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0006, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0101, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0166, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0144, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0156, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0117, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0020, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0100, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0155, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0109, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0138, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0007, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0101, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0118, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0116, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0105, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0096, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0129, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0015, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0107, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0195, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0136, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0124, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0013, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0085, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0140, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0113, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0135, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0192, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0122, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0152, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0143, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0122, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0116, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0105, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0084, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0136, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0164, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0180, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0018, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0147, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0125, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0118, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0124, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0257, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0080, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0008, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0123, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0085, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0082, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0090, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0103, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0020, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0086, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0117, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0141, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0120, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0134, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0082, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0091, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0140, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0119, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0015, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0162, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0081, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0149, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0152, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0142, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0113, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0131, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0167, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0102, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0164, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0109, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0128, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0162, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0115, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0014, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0210, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0220, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0156, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0095, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0083, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0101, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0113, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0121, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0122, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0088, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0237, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0151, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0116, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0139, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0108, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0174, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0137, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0109, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0155, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0151, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0096, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0214, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0101, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0125, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0151, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0088, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0084, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0217, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0114, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0119, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0094, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0081, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0106, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0086, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0086, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0187, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0237, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0085, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0152, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0116, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0179, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0080, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0011, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0090, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0145, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0087, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0149, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0087, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0121, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0089, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0093, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0187, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0022, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0200, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0230, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0108, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0086, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0117, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0083, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0021, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0133, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0094, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0017, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0104, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0011, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0158, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0120, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0124, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0127, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0090, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0017, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0147, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0288, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0097, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0095, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0093, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0169, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0088, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0110, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0020, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0169, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0148, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0108, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0125, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0009, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0121, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0159, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0161, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0123, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0005, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0082, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0118, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0117, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0105, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0210, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0114, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0081, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0109, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0151, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0109, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0145, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0011, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0151, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0152, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0098, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0136, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0136, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0119, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0081, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0110, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0102, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0108, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0146, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0198, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0131, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0117, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0127, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0083, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0086, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0091, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0024, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0153, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0101, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0130, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0130, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0140, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0165, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0141, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0144, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0023, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0199, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0170, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0092, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0139, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0142, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0087, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0105, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0105, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0091, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0110, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0123, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0083, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0212, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0083, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0141, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0089, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0092, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0104, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0005, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0199, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0112, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0129, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0120, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0175, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0163, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0155, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0159, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0017, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0133, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0190, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0184, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0146, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0120, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0124, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0086, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0090, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0190, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0090, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0145, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0094, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0097, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0105, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0185, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0164, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0160, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0134, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0132, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0143, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0143, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0155, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0088, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0095, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0182, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0111, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0094, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0118, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0083, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0230, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0137, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0125, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0085, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0105, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0084, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0120, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0145, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0159, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0088, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0111, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0149, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0110, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0113, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0102, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0010, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0135, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0119, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0194, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0137, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0152, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0105, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0094, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0125, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0097, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0108, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0123, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0085, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0179, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0129, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0167, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0198, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0084, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0089, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0080, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0088, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0193, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0125, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0106, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0167, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0135, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0095, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0198, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0118, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0109, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0144, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0151, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0089, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0098, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0088, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0091, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0086, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0084, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0156, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0022, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0130, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0134, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0023, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0172, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0167, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0107, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0176, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0094, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0154, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0142, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0090, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0152, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0093, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0227, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0176, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0154, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0090, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0012, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0105, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0100, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0140, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0146, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0156, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0137, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0091, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0180, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0226, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0009, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0152, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0088, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0095, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0085, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0225, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0143, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0011, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0015, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0095, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0084, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0110, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0109, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0125, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0094, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0022, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0115, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0080, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0132, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0174, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0112, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0193, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0114, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0174, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0116, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0082, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0011, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0010, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0170, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0084, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0111, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0116, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0155, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0114, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0020, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0172, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0146, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0089, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0151, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0015, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0118, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0099, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0024, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0128, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0173, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0127, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0080, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0112, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0171, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0169, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0129, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0119, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0203, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0142, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0082, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0086, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0024, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0139, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0106, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0010, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0193, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0123, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0393, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0135, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0117, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0135, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0015, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0117, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0099, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0003, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0126, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0121, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0095, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0136, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0146, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0155, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0113, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0085, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0143, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0108, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0004, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0162, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0090, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0119, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0116, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0009, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0102, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0088, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0080, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0097, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0191, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0113, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0083, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0151, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0022, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0108, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0012, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0146, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0090, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0121, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0085, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0158, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0083, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0080, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0112, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0215, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0171, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0171, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0191, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0119, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0165, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0019, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0162, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0096, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0169, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0092, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0123, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0178, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0088, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0103, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0150, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0114, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0157, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0128, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0099, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0162, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0254, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0104, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0131, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0121, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0135, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0116, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0157, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0200, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0133, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0149, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0191, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0126, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0163, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0136, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0159, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0084, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0115, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0180, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0120, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0011, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0114, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0106, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0133, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0089, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0195, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0100, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0014, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0020, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0115, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0098, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0219, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0080, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0014, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0089, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0095, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0129, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0088, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0148, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0175, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0129, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0134, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0011, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0012, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0105, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0103, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0083, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0132, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0087, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0094, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0096, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0133, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0140, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0087, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0183, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0121, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0239, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0098, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0104, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0170, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0120, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0093, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0128, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0102, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0139, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0169, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0174, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0152, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0181, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0110, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0018, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0097, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0090, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0175, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0082, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0104, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0008, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0156, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0137, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0170, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0012, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0015, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0083, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0132, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0081, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0145, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0118, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0115, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0187, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0101, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0196, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0092, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0085, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0166, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0003, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0103, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0147, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0253, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0140, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0160, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0121, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0106, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0124, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0095, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0023, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0085, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0097, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0181, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0194, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0120, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0138, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0087, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0122, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0011, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0104, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0136, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0095, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0147, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0169, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0106, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0097, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0088, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0143, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0009, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0003, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0128, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0164, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0127, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0163, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0093, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0110, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0102, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0146, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0090, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0020, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0103, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0123, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0114, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0095, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0015, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0100, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0086, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0185, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0142, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0084, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0118, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0105, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0138, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0137, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0004, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0157, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0094, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0188, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0126, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0107, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0159, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0099, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0099, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0162, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0199, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0095, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0238, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0094, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0088, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0162, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0110, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0087, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0106, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0098, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0093, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0090, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0121, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0012, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0135, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0009, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0101, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0106, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0099, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0088, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0110, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0142, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0019, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0024, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0085, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0122, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0132, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0111, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0125, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0085, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0118, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0144, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0083, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0012, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0167, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0126, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0139, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0115, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0165, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0111, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0138, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0161, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0160, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0095, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0123, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0124, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0019, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0096, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0163, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0117, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0090, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0081, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0125, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0085, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0256, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0120, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0210, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0216, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0080, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0097, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0127, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0154, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0081, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0091, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0021, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0123, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0109, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0121, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0082, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0096, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0094, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0087, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0013, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0089, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0002, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0098, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0188, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0175, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0099, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0171, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0157, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0087, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0149, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0098, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0087, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0122, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0141, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0100, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0003, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0132, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0002, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0091, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0008, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0119, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0169, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0117, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0102, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0104, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0159, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0142, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0092, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0102, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0183, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0008, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0135, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0107, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0130, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0011, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0085, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0132, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0112, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0157, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0118, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0102, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0156, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0129, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0009, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0147, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0088, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0118, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0087, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0114, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0135, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0115, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0082, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0119, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0141, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0101, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0202, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0018, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0111, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0087, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0211, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0093, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0092, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0096, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0146, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0105, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0160, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0115, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0097, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0139, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0085, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0115, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0114, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0237, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0169, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0081, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0017, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0129, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0134, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0130, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0123, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0096, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0129, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0155, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0125, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0108, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0114, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0091, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0142, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0104, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0177, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0156, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0136, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0133, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0111, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0082, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0087, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0117, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0162, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0212, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0098, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0084, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0010, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0086, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0160, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0110, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0166, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0198, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0106, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0260, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0099, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0088, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0006, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0190, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0107, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0087, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0085, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0118, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0098, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0122, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0023, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0106, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0122, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0097, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0151, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0118, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0218, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0020, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0017, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0129, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0095, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0085, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0111, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0169, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0157, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0257, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0146, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0193, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0274, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0154, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0091, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0104, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0160, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0157, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0111, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0164, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0108, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0092, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0099, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0099, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0126, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0111, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0098, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0083, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0011, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0169, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0019, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0129, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0090, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0189, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0194, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0105, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0123, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0114, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0121, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0085, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0024, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0104, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0097, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0099, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0175, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0124, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0166, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0123, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0087, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0140, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0103, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0143, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0194, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0099, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0109, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0153, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0024, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0157, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0109, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0143, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0018, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0082, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0200, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0142, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0112, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0094, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0137, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0119, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0018, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0094, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0117, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0116, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0097, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0095, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0095, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0108, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0135, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0080, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0103, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0017, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0085, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0084, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0092, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0005, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0098, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0096, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0080, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0092, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0087, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0100, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0105, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0093, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0108, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0087, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0089, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0120, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0118, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0137, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0118, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0086, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0012, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0090, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0152, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0105, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0118, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0156, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0082, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0111, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0096, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0139, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0127, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0088, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0134, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0162, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0147, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0122, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0123, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0017, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0108, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0110, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0138, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0167, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0081, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0130, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0116, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0180, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0092, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0198, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0099, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0114, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0137, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0120, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0149, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0094, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0104, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0103, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0113, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0095, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0105, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0131, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0217, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0101, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0111, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0096, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0205, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0083, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0019, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0020, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0137, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0023, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0097, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0117, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0124, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0126, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0104, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0164, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0182, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0113, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0080, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0115, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0012, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0123, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0088, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0020, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0024, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0092, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0218, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0125, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0098, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0174, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0120, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0088, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0131, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0096, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0160, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0015, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0164, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0004, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0099, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0095, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0193, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0277, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0089, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0100, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0136, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0106, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0127, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0007, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0171, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0014, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0160, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0089, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0101, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0206, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0105, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0136, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0113, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0092, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0217, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0212, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0134, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0121, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0083, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0089, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0233, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0162, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0194, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0111, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0155, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0108, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0129, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0083, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0248, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0086, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0090, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0094, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0176, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0085, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0085, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0115, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0098, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0140, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0101, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0129, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0115, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0113, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0203, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0100, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0134, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0114, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0092, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0147, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0168, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0128, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0145, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0103, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0142, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0081, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0087, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0107, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0015, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0186, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0135, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0095, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0100, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0102, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0108, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0096, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0094, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0093, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0136, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0109, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0011, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0115, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0151, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0094, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0093, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0148, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0100, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0012, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0127, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0092, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0112, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0104, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0132, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0096, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0103, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0092, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0108, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0133, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0120, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0107, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0115, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0108, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0123, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0092, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0019, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0109, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0157, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0004, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0211, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0009, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0172, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0107, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0196, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0005, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0111, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0141, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0130, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0012, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0145, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0166, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0095, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0195, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0125, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0081, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0017, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0145, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0102, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0125, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0020, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0094, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0107, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0321, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0209, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0124, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0104, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0086, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0101, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0114, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0012, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0179, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0162, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0139, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0131, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0150, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0103, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0156, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0089, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0083, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0150, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0103, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0086, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0114, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0097, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0142, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0103, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0126, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0092, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0098, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0167, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0152, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0104, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0091, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0141, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0130, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0144, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0112, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0145, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0143, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0090, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0140, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0148, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0024, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0023, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0152, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0106, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0090, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0085, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0205, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0101, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0011, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0085, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0103, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0097, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0119, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0206, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0104, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0124, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0082, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0154, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0091, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0093, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0024, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0080, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0009, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0156, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0204, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0209, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0013, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0136, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0130, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0122, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0144, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0101, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0083, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0100, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0171, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0107, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0170, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0136, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0024, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0095, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0097, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0177, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0150, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0155, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0089, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0104, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0087, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0086, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0091, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0107, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0085, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0128, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0180, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0118, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0126, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0162, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0193, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0167, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0090, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0138, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0086, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0157, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0128, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0123, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0128, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0104, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0317, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0094, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0018, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0111, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0101, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0085, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0098, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0102, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0152, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0082, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0151, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0021, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0146, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0090, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0009, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0081, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0081, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0093, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0131, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0136, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0192, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0105, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0105, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0214, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0164, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0093, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0149, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0094, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0167, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0084, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0083, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0114, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0099, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0085, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0129, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0003, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0101, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0081, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0113, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0098, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0091, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0088, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0097, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0114, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0091, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0024, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0097, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0101, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0095, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0128, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0105, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0011, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0161, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0004, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0108, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0158, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0147, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0124, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0101, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0198, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0116, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0116, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0145, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0091, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0104, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0094, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0087, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0007, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0009, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0128, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0256, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0144, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0112, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0167, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0099, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0146, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0152, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0021, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0106, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0117, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0084, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0091, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0082, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0085, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0135, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0118, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0147, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0127, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0117, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0134, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0231, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0007, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0090, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0155, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0114, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0122, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0018, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0152, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0090, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0171, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0110, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0102, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0083, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0013, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0113, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0111, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0132, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0102, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0100, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0096, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0221, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0088, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0160, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0107, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0023, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0129, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0132, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0087, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0100, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0084, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0104, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0092, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0186, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0090, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0017, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0085, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0124, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0155, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0148, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0104, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0086, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0148, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0153, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0112, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0087, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0120, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0091, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0140, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0144, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0110, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0011, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0100, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0013, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0082, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0127, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0127, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0101, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0090, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0188, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0126, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0208, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0081, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0241, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0100, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0087, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0146, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0007, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0103, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0193, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0268, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0088, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0084, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0090, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0185, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0081, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0099, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0107, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0133, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0156, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0143, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0125, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0107, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0107, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0152, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0112, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0156, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0103, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0126, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0022, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0126, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0008, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0172, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0184, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0112, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0192, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0089, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0094, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0100, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0006, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0082, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0103, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0106, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0252, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0094, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0138, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0105, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0119, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0099, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0081, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0017, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0091, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0004, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0119, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0191, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0176, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0102, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0105, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0117, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0218, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0188, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0117, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0101, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0195, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0134, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0116, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0084, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0092, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0115, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0132, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0108, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0084, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0109, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0157, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0081, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0085, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0093, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0098, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0162, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0088, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0109, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0091, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0230, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0093, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0245, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0081, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0215, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0080, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0092, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0097, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0018, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0130, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0180, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0103, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0134, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0137, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0087, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0097, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0115, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0126, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0117, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0094, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0113, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0096, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0150, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0082, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0011, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0014, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0133, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0110, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0024, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0098, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0096, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0088, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0103, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0196, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0177, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0168, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0192, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0100, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0114, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0020, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0164, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0016, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0148, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0089, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0125, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0094, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0193, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0091, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0101, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0146, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0093, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0087, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0141, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0134, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0173, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0130, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0023, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0112, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0133, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0086, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0097, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0111, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0093, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0153, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0113, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0080, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0145, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0117, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0169, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0101, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0107, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0107, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0109, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0113, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0122, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0084, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0133, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0115, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0118, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0102, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0164, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0081, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0103, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0171, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0016, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0098, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0171, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0003, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0117, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0098, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0083, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0118, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0120, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0098, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0109, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0089, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0100, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0114, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0081, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0023, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0148, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0142, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0232, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0151, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0167, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0104, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0106, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0085, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0154, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0100, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0182, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0158, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0081, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0100, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0110, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0144, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0146, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0255, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0198, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0021, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0179, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0099, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0108, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0113, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0123, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0081, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0110, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0108, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0096, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0096, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0144, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0102, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0101, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0137, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0211, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0007, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0110, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0023, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0152, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0109, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0126, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0110, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0239, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0114, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0117, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0124, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0164, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0087, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0106, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0127, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0113, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0181, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0100, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0091, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0200, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0083, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0112, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0181, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0162, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0163, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0160, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0084, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0201, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0169, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0141, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0292, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0136, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0180, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0155, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0119, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0115, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0130, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0291, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0102, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0118, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0156, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0093, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0091, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0110, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0136, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0083, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0142, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0158, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0165, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0209, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0003, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0154, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0088, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0126, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0095, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0020, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0089, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0174, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0098, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0235, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0085, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0137, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0108, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0123, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0114, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0171, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0085, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0129, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0086, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0160, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0140, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0106, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0187, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0091, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0089, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0020, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0141, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0081, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0202, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0128, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0124, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0210, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0216, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0142, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0087, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0097, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0196, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0114, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0163, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0088, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0088, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0112, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0127, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0183, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0130, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0085, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0103, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0107, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0135, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0101, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0116, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0089, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0115, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0113, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0158, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0131, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0136, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0015, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0081, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0096, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0098, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0101, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0081, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0015, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0089, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0125, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0022, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0130, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0144, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0103, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0184, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0125, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0109, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0110, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0106, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0091, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0122, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0113, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0135, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0096, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0103, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0138, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0080, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0118, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0150, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0084, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0150, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0147, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0146, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0097, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0127, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0111, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0083, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0081, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0111, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0154, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0123, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0087, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0116, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0092, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0087, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0011, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0110, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0019, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0178, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0088, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0127, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0135, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0128, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0089, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0023, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0182, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0018, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0015, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0105, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0098, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0010, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0016, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0089, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0083, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0183, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0121, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0013, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0169, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0106, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0184, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0015, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0015, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0084, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0110, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0100, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0142, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0225, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0083, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0148, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0202, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0094, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0171, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0019, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0391, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0089, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0094, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0149, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0104, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0111, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0163, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0137, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0091, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0180, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0006, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0086, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0111, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0146, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0116, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0085, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0101, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0208, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0080, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0086, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0140, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0021, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0086, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0082, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0164, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0141, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0103, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0110, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0122, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0122, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0121, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0159, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0110, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0176, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0112, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0110, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0204, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0092, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0132, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0115, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0017, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0094, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0093, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0101, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0140, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0105, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0114, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0094, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0101, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0106, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0164, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0189, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0084, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0128, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0164, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0111, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0093, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0091, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0089, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0014, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0081, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0110, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0141, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0015, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0119, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0146, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0118, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0126, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0140, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0143, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0009, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0088, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0144, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0093, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0183, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0152, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0162, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0082, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0126, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0122, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0150, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0100, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0004, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0089, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0016, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0168, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0174, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0085, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0088, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0255, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0168, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0087, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0090, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0006, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0024, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0235, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0105, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0242, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0134, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0102, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0101, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0173, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0122, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0153, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0246, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0114, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0092, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0090, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0114, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0134, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0105, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0082, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0099, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0092, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0121, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0020, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0086, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0108, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0100, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0140, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0080, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0111, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0115, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0094, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0093, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0084, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0151, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0087, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0085, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0085, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0017, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0201, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0090, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0144, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0144, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0125, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0020, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0088, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0009, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0080, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0119, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0123, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0093, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0122, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0100, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0115, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0162, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0090, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0100, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0024, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0141, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0115, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0154, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0083, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0125, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0002, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0091, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0085, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0110, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0122, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0098, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0119, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0101, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0110, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0106, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0107, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0164, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0124, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0091, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0086, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0176, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0094, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0117, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0101, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0085, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0099, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0185, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0117, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0084, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0107, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0021, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0172, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0163, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0149, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0125, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0083, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0149, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0080, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0118, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0144, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0134, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0107, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0126, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0165, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0093, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0115, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0117, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0118, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0084, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0099, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0124, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0089, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0170, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0105, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0103, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0009, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0224, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0153, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0131, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0122, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0024, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0012, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0106, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0126, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0234, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0080, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0119, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0094, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0082, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0092, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0172, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0153, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0014, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0111, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0156, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0122, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0144, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0120, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0090, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0107, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0161, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0091, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0134, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0131, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0097, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0114, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0153, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0126, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0085, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0116, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0089, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0091, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0131, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0148, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0117, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0147, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0150, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0087, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0123, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0265, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0214, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0113, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0083, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0235, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0088, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0100, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0003, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0132, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0085, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0137, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0003, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0107, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0136, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0019, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0127, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0128, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0085, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0127, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0122, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0183, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0098, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0202, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0139, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0101, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0148, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0155, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0013, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0102, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0144, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0147, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0092, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0193, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0173, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0152, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0084, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0188, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0087, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0083, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0017, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0194, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0114, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0124, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0126, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0098, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0086, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0113, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0131, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0024, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0185, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0104, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0092, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0085, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0112, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0111, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0103, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0090, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0003, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0105, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0080, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0008, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0143, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0132, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0164, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0081, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0152, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0020, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0154, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0024, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0081, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0209, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0117, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0081, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0185, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0122, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0163, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0015, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0089, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0083, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0146, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0100, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0192, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0197, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0101, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0087, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0089, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0086, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0103, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0161, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0102, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0127, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0175, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0089, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0023, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0089, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0137, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0086, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0130, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0151, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0100, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0093, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0096, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0098, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0018, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0114, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0142, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0094, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0090, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0105, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0084, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0105, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0236, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0101, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0098, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0145, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0126, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0108, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0020, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0123, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0119, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0190, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0107, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0086, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0086, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0116, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0124, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0117, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0086, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0127, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0154, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0007, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0017, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0092, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0159, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0101, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0083, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0148, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0151, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0168, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0215, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0119, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0101, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0100, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0225, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0097, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0149, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0099, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0191, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0103, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0099, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0115, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0101, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0234, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0106, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0106, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0132, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0013, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0023, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0153, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0006, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0091, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0232, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0114, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0008, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0021, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0157, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0131, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0103, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0023, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0144, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0127, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0195, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0139, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0005, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0100, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0220, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0090, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0121, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0160, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0181, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0149, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0181, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0152, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0237, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0088, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0019, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0020, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0007, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0170, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0147, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0151, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0086, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0092, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0097, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0132, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0102, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0082, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0081, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0141, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0162, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0166, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0109, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0087, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0152, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0133, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0091, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0018, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0138, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0140, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0083, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0146, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0119, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0100, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0126, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0098, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0207, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0137, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0106, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0136, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0133, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0115, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0097, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0087, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0145, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0108, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0121, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0089, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0088, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0114, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0013, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0106, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0002, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0090, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0127, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0104, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0145, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0085, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0128, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0232, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0114, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0161, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0008, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0094, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0198, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0092, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0089, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0017, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0089, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0103, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0094, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0002, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0156, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0107, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0118, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0103, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0154, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0130, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0023, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0080, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0090, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0120, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0229, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0092, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0142, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0177, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0020, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0130, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0087, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0108, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0139, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0093, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0247, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0118, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0096, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0152, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0096, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0140, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0123, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0134, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0192, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0111, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0101, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0113, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0155, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0097, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0099, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0142, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0023, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0147, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0119, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0138, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0092, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0088, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0137, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0088, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0143, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0216, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0135, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0142, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0082, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0131, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0116, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0086, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0096, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0137, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0087, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0113, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0195, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0161, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0111, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0086, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0169, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0101, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0180, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0111, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0118, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0154, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0084, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0002, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0156, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0092, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0090, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0116, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0001, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0176, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0122, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0100, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0120, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0222, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0126, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0108, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0142, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0140, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0016, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0098, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0124, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0104, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0082, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0247, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0137, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0114, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0117, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0023, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0129, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0001, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0006, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0115, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0100, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0184, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0147, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0173, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0186, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0086, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0095, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0115, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0125, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0115, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0021, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0017, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0098, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0166, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0131, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0001, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0024, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0205, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0256, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0080, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0090, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0009, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0097, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0118, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0110, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0012, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0081, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0087, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0140, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0158, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0015, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0102, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0107, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0002, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0101, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0119, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0129, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0125, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0019, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0002, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0094, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0094, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0131, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0082, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0109, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0093, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0087, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0098, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0001, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0107, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0133, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0088, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0104, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0128, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0109, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0108, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0095, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0080, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0016, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0114, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0211, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0101, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0097, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0109, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0131, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0120, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0108, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0096, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0119, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0208, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0095, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0110, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0129, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0203, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0110, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0115, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0081, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0138, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0090, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0095, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0084, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0129, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0117, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0090, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0009, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0089, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0101, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0092, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0083, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0185, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0091, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0104, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0125, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0141, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0122, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0101, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0097, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0146, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0099, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0096, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0170, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0180, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0208, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0126, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0164, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0104, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0023, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0099, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0085, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0118, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0112, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0162, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0083, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0184, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0135, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0111, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0113, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0080, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0122, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0119, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0108, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0115, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0126, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0132, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0007, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0155, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0092, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0196, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0149, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0104, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0005, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0021, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0122, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0103, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0080, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0104, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0161, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0116, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0098, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0127, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0153, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0081, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0141, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0102, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0080, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0166, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0184, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0127, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0080, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0084, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0091, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0176, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0122, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0120, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0124, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0135, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0135, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0107, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0105, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0156, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0116, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0139, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0102, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0098, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0009, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0127, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0096, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0139, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0124, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0122, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0115, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0181, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0173, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0126, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0093, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0080, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0080, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0020, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0092, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0089, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0149, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0205, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0128, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0200, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0017, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0105, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0161, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0107, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0118, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0153, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0095, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0141, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0145, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0099, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0104, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0120, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0106, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0152, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0109, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0024, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0119, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0188, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0116, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0003, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0145, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0110, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0133, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0099, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0120, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0122, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0183, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0144, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0117, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0099, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0101, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0092, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0210, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0107, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0150, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0135, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0132, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0123, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0089, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0114, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0141, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0150, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0171, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0090, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0149, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0081, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0122, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0080, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0082, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0103, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0110, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0144, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0107, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0086, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0122, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0090, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0089, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0111, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0122, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0107, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0159, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0102, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0022, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0121, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0128, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0106, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0102, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0193, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0112, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0144, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0014, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0192, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0120, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0099, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0082, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0085, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0142, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0094, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0122, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0136, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0163, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0107, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0093, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0099, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0116, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0134, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0182, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0217, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0107, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0084, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0140, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0093, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0102, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0108, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0165, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0161, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0103, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0156, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0120, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0097, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0010, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0093, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0135, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0111, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0110, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0109, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0125, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0111, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0088, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0116, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0146, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0151, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0115, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0128, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0010, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0120, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0091, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0149, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0255, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0090, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0131, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0151, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0086, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0095, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0123, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0127, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0121, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0139, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0210, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0094, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0101, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0103, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0126, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0192, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0146, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0149, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0086, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0137, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0197, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0161, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0119, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0126, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0169, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0088, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0092, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0156, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0016, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0121, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0102, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0131, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0130, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0081, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0168, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0080, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0167, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0104, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0187, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0021, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0088, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0105, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0138, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0086, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0087, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0114, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0151, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0104, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0125, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0085, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0161, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0085, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0109, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0084, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0110, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0110, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0145, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0181, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0088, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0136, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0138, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0114, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0094, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0092, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0117, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0084, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0124, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0116, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0091, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0088, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0119, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0084, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0023, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0112, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0100, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0100, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0113, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0088, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0183, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0092, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0080, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0102, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0147, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0089, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0102, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0116, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0081, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0090, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0086, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0081, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0116, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0107, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0091, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0085, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0017, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0116, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0091, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0133, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0110, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0080, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0085, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0159, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0113, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0091, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0089, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0128, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0104, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0105, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0123, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0085, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0021, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0118, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0119, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0151, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0112, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0105, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0120, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0143, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0094, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0129, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0103, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0099, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0142, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0134, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0097, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0098, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0080, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0092, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0102, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0094, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0100, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0135, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0104, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0159, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0080, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0117, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0091, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0151, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0106, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0140, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0127, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0092, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0089, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0157, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0090, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0094, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0093, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0126, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0011, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0092, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0100, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0080, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0174, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0080, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0103, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0104, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0108, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0084, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0158, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0115, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0193, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0089, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0086, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0155, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0112, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0133, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0101, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0085, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0092, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0129, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0085, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0103, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0123, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0102, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0097, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0105, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0154, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0130, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0131, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0118, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0094, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0084, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0108, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0138, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0081, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0114, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0121, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0120, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0166, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0110, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0084, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0112, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0088, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0162, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0121, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0250, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0082, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0083, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0129, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0104, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0102, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0094, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0092, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0087, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0136, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0101, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0090, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0111, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0094, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0117, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0022, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0188, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0175, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0092, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0113, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0124, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0100, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0008, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0084, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0083, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0087, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0130, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0162, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0131, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0161, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0099, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0155, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0149, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0083, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0158, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0086, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0087, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0096, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0087, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0019, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0123, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0082, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0088, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0111, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0114, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0092, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0114, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0156, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0117, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0080, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0132, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0132, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0096, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0175, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0182, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0086, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0088, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0142, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0189, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0084, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0142, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0101, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0081, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0113, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0089, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0093, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0024, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0114, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0092, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0096, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0252, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0106, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0132, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0168, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0084, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0106, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0182, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0094, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0112, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0194, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0124, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0110, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0133, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0127, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0089, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0097, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0082, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0125, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0102, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0106, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0140, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0083, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0108, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0165, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0147, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0081, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0082, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0092, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0157, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0105, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0116, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0091, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0105, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0096, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0124, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0126, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0101, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0110, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0140, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0107, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0113, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0105, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0102, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0088, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0254, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0104, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0167, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0104, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0157, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0109, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0148, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0109, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0124, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0106, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0104, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0137, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0096, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0118, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0081, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0176, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0008, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0087, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0148, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0141, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0149, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0161, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0113, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0083, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0022, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0172, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0107, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0134, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0089, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0093, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0092, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0103, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0081, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0121, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0161, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0162, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0019, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0138, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0105, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0101, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0100, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0085, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0101, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0116, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0123, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0168, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0180, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0104, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0116, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0117, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0167, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0134, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0148, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0125, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0096, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0087, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0086, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0093, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0149, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0152, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0155, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0138, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0083, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0111, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0132, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0024, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0098, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0136, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0096, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0093, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0179, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0081, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0113, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0111, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0126, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0150, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0082, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0121, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0081, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0127, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0124, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0094, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0125, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0118, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0100, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0087, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0080, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0113, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0092, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0089, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0114, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0180, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0097, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0101, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0111, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0097, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0087, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0129, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0106, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0022, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0103, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0122, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0137, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0088, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0146, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0107, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0160, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0111, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0120, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0142, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0115, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0150, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0115, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0105, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0149, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0112, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0125, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0161, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0090, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0090, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0143, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0094, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0147, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0162, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0176, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0117, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0126, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0123, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0092, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0016, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0080, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0096, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0130, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0113, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0216, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0089, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0095, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0085, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0104, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0179, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0095, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0085, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0093, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0021, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0087, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0092, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0134, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0095, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0099, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0143, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0137, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0095, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0149, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0024, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0185, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0024, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0117, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0142, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0088, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0110, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0080, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0090, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0089, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0138, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0105, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0082, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0088, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0023, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0137, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0092, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0116, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0132, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0086, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0089, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0080, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0127, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0098, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0085, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0081, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0130, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0094, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0146, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0156, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0150, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0085, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0101, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0129, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0136, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0111, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0091, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0081, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0132, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0137, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0101, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0086, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0082, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0094, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0096, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0195, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0124, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0097, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0013, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0105, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0094, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0175, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0082, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0088, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0143, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0086, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0083, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0190, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0199, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0104, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0099, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0107, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0108, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0131, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0081, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0099, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0200, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0012, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0153, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0109, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0164, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0098, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0084, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0104, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0214, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0096, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0101, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0140, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0087, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0099, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0090, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0109, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0024, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0180, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0093, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0082, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0116, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0095, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0089, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0147, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0081, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0095, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0140, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0086, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0138, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0148, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0111, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0097, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0008, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0133, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0082, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0100, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0121, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0110, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0088, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0112, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0119, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0085, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0145, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0169, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0108, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0094, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0146, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0094, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0111, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0135, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0083, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0084, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0150, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0139, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0135, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0148, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0097, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0083, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0149, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0120, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0118, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0105, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0115, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0103, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0124, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0094, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0141, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0175, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0106, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0097, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0092, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0140, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0022, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0119, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0080, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0098, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0149, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0082, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0187, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0211, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0084, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0119, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0113, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0130, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0118, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0135, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0093, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0118, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0157, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0112, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0128, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0167, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0087, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0085, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0087, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0099, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0108, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0089, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0165, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0130, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0092, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0112, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0087, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0161, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0124, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0154, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0178, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0085, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0092, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0124, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0154, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0082, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0107, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0088, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0120, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0083, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0007, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0081, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0088, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0130, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0098, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0020, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0105, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0111, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0087, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0103, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0132, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0117, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0117, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0082, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0122, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0127, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0130, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0110, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0099, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0081, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0096, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0188, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0086, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0094, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0118, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0086, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0018, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0132, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0101, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0105, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0103, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0109, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0080, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0093, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0087, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0142, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0133, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0098, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0117, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0101, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0111, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0104, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0113, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0081, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0108, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0147, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0080, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0106, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0090, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0110, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0166, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0118, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0089, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0093, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0112, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0163, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0092, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0091, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0200, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0105, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0090, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0203, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0094, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0190, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0165, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0019, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0088, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0106, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0116, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0086, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0085, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0100, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0108, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0106, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0096, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0121, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0082, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0113, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0153, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0128, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0088, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0100, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0083, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0106, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0129, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0091, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0112, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0129, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0135, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0102, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0082, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0092, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0144, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0008, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0082, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0097, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0092, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0101, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0181, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0118, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0083, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0023, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0135, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0155, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0105, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0084, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0102, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0110, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0108, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0097, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0024, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0168, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0105, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0090, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0186, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0108, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0105, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0112, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0096, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0309, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0107, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0085, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0081, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0117, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0140, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0093, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0104, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0086, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0104, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0083, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0091, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0109, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0090, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0080, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0129, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0105, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0126, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0143, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0121, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0090, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0114, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0149, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0150, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0156, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0160, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0143, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0096, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0108, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0088, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0095, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0113, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0155, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0122, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0087, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0081, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0101, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0128, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0104, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0098, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0107, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0103, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0082, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0177, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0022, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0087, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0099, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0103, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0134, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0123, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0021, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0088, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0087, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0141, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0085, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0106, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0087, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0096, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0093, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0085, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0173, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0083, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0022, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0023, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0095, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0093, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0024, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0143, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0189, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0143, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0101, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0119, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0132, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0088, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0103, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0111, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0116, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0220, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0087, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0120, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0139, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0086, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0091, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0093, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0098, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0126, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0091, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0085, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0084, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0150, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0089, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0098, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0143, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0201, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0120, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0091, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0110, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0107, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0083, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0119, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0127, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0148, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0103, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0110, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0110, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0015, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0172, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0081, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0086, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0087, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0092, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0014, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0011, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0094, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0137, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0090, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0115, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0022, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0082, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0154, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0087, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0123, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0158, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0100, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0080, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0169, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0131, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0118, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0148, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0116, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0091, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0183, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0147, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0120, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0083, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0083, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0090, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0115, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0094, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0116, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0108, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0106, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0129, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0126, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0086, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0110, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0097, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0167, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0101, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0141, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0113, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0135, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0080, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0129, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0110, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0091, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0121, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0112, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0114, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0011, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0178, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0091, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0088, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0099, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0099, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0106, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0024, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0192, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0145, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0085, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0132, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0125, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0121, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0135, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0085, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0088, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0096, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0109, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0108, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0115, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0101, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0104, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0113, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0131, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0121, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0191, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0135, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0111, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0084, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0124, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0086, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0103, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0132, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0093, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0138, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0095, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0095, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0106, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0194, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0134, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0154, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0107, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0082, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0085, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0013, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0156, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0125, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0118, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0094, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0120, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0128, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0094, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0111, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0086, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0144, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0083, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0102, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0122, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0173, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0105, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0088, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0100, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0190, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0107, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0108, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0092, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0146, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0126, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0168, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0081, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0019, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0126, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0090, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0087, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0119, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0085, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0085, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0096, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0081, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0089, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0104, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0097, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0127, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0138, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0103, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0095, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0171, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0083, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0151, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0137, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0096, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0117, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0096, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0126, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0118, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0087, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0114, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0014, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0129, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0123, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0084, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0083, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0161, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0092, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0109, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0098, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0087, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0113, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0092, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0094, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0191, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0172, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0012, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0099, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0117, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0126, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0082, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0107, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0080, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0130, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0090, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0081, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0209, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0100, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0092, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0137, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0169, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0084, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0163, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0119, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0082, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0117, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0080, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0104, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0086, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0096, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0080, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0098, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0093, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0161, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0121, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0106, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0083, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0139, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0113, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0138, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0127, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0132, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0115, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0095, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0084, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0122, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0089, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0095, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0095, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0126, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0129, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0024, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0087, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0116, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0102, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0104, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0020, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0105, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0142, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0166, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0017, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0172, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0084, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0109, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0080, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0080, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0098, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0103, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0016, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0083, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0213, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0166, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0113, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0227, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0126, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0084, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0090, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0087, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0081, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0009, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0138, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0103, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0099, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0093, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0161, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0123, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0144, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0092, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0138, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0117, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0118, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0113, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0126, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0113, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0084, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0120, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0121, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0121, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0128, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0014, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0089, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0166, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0085, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0098, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0112, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0083, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0145, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0090, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0090, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0135, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0145, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0229, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0117, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0105, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0157, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0082, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0115, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0085, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0017, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0080, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0085, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0164, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0099, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0108, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0120, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0113, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0110, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0090, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0084, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0090, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0147, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0110, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0111, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0019, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0083, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0081, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0150, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0159, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0153, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0112, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0126, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0109, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0080, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0137, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0148, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0106, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0083, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0131, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0139, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0124, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0103, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0083, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0170, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0102, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0080, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0086, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0132, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0086, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0099, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0098, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0092, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0018, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0125, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0096, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0109, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0130, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0091, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0081, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0110, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0094, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0108, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0122, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0115, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0117, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0104, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0125, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0140, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0104, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0097, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0188, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0124, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0125, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0085, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0120, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0081, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0171, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0103, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0106, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0142, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0168, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0118, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0119, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0117, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0108, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0096, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0119, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0095, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0022, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0129, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0154, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0089, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0011, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0122, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0099, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0090, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0147, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0111, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0106, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0083, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0142, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0013, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0080, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0122, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0095, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0085, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0105, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0158, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0084, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0133, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0086, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0152, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0119, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0084, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0143, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0111, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0095, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0105, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0103, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0142, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0101, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0098, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0101, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0169, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0120, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0099, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0100, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0123, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0122, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0186, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0177, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0094, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0116, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0174, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0126, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0103, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0080, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0147, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0080, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0087, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0081, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0158, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0085, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0158, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0086, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0096, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0094, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0116, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0153, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0141, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0135, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0142, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0103, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0083, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0102, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0087, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0111, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0151, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0084, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0106, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0081, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0178, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0139, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0142, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0189, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0117, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0111, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0153, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0125, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0126, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0096, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0081, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0091, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0098, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0110, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0096, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0091, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0101, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0106, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0117, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0124, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0086, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0138, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0159, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0146, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0118, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0131, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0105, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0127, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0123, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0124, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0092, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0141, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0126, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0109, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0093, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0134, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0085, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0117, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0098, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0141, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0124, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0022, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0169, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0108, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0084, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0115, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0092, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0146, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0109, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0090, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0151, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0095, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0082, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0132, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0148, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0109, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0111, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0130, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0116, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0114, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0117, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0099, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0086, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0179, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0102, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0194, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0102, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0100, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0099, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0100, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0096, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0094, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0141, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0123, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0107, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0102, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0089, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0122, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0086, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0155, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0101, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0105, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0093, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0087, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0082, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0117, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0102, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0091, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0123, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0092, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0083, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0114, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0177, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0083, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0098, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0119, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0150, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0128, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0111, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0247, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0080, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0142, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0098, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0098, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0098, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0140, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0117, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0085, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0127, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0103, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0093, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0109, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0083, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0087, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0148, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0087, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0111, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0130, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0101, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0080, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0138, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0129, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0113, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0120, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0087, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0104, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0143, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0091, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0109, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0102, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0178, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0108, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0096, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0083, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0119, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0122, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0134, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0093, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0137, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0141, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0150, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0130, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0098, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0128, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0130, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0123, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0083, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0151, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0127, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0103, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0117, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0089, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0090, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0111, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0132, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0151, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0111, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0149, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0114, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0023, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0092, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0130, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0124, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0091, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0136, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0131, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0169, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0109, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0084, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0119, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0121, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0294, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0150, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0092, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0087, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0125, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0080, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0104, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0086, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0120, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0095, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0124, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0150, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0085, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0110, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0132, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0183, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0096, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0085, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0191, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0147, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0094, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0242, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0092, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0120, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0157, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0088, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0118, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0105, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0107, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0091, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0080, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0100, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0092, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0086, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0134, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0124, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0099, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0106, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0128, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0103, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0094, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0144, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0158, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0149, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0087, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0129, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0096, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0118, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0100, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0128, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0130, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0095, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0153, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0096, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0117, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0083, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0152, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0088, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0113, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0019, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0145, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0096, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0110, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0090, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0141, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0090, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0085, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0018, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0104, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0101, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0149, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0135, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0117, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0153, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0088, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0099, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0177, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0080, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0081, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0081, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0097, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0134, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0105, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0092, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0146, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0129, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0114, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0087, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0116, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0091, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0156, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0127, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0123, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0103, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0133, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0090, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0085, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0091, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0126, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0163, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0113, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0097, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0090, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0126, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0091, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0123, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0084, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0145, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0024, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0103, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0109, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0088, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0140, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0087, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0084, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0098, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0106, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0108, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0095, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0152, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0088, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0111, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0110, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0087, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0102, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0094, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0108, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0081, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0010, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0095, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0024, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0100, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0165, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0113, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0170, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0082, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0111, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0090, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0220, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0134, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0082, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0102, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0114, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0086, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0082, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0088, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0098, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0093, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0093, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0153, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0131, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0098, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0201, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0147, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0084, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0121, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0126, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0106, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0103, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0145, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0080, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0160, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0137, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0083, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0144, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0107, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0118, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0102, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0087, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0101, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0160, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0090, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0143, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0104, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0118, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0097, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0135, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0083, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0112, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0088, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0250, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0109, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0090, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0138, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0165, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0097, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0013, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0113, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0097, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0094, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0117, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0165, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0107, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0123, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0107, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0090, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0131, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0024, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0097, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0102, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0102, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0121, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0131, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0192, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0108, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0103, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0085, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0089, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0175, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0101, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0114, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0081, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0123, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0105, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0086, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0099, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0081, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0124, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0143, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0141, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0091, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0098, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0080, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0133, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0109, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0087, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0017, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0081, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0112, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0113, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0193, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0108, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0090, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0103, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0023, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0151, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0090, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0103, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0087, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0136, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0087, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0108, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0144, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0094, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0023, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0126, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0194, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0100, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0102, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0095, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0148, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0160, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0129, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0080, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0115, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0091, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0112, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0082, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0087, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0089, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0152, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0083, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0080, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0082, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0115, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0144, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0099, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0098, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0191, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0111, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0086, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0096, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0080, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0092, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0088, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0109, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0168, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0118, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0167, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0087, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0190, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0102, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0119, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0095, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0082, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0125, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0024, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0102, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0152, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0143, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0099, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0108, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0084, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0085, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0110, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0094, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0099, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0093, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0086, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0124, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0137, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0184, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0148, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0091, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0113, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0100, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0113, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0141, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0087, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0084, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0093, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0083, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0084, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0095, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0140, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0104, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0118, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0098, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0113, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0100, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0088, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0116, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0120, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0083, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0099, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0081, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0080, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0133, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0105, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0089, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0112, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0083, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0091, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0088, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0022, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0095, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0125, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0104, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0149, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0151, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0157, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0147, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0123, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0082, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0126, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0129, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0120, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0081, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0148, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0106, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0080, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0085, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0017, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0132, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0090, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0090, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0093, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0110, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0082, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0087, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0148, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0013, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0105, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0092, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0120, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0120, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0094, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0085, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0092, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0125, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0087, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0097, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0100, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0105, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0127, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0100, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0083, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0106, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0088, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0094, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0137, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0089, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0147, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0101, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0143, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0150, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0194, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0091, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0089, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0102, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0178, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0093, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0097, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0081, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0134, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0127, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0117, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0089, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0089, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0081, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0104, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0129, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0082, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0107, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0111, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0147, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0117, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0103, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0090, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0117, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0111, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0083, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0134, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0125, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0021, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0140, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0093, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0109, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0112, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0090, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0123, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0086, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0123, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0097, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0137, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0116, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0103, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0083, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0104, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0099, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0093, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0113, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0087, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0174, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0138, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0188, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0092, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0137, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0101, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0080, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0086, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0084, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0081, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0099, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0133, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0089, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0090, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0095, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0082, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0140, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0099, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0144, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0089, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0098, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0089, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0110, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0086, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0080, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0116, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0085, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0094, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0094, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0099, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0020, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0102, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0083, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0122, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0132, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0165, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0103, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0082, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0088, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0094, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0202, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0016, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0086, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0132, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0089, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0085, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0158, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0084, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0082, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0019, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0092, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0086, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0144, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0127, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0139, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0130, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0110, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0129, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0128, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0088, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0127, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0112, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0094, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0110, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0080, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0117, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0094, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0080, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0206, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0182, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0098, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0099, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0138, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0136, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0094, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0100, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0116, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0128, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0096, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0092, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0137, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0139, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0198, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0104, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0094, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0082, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0098, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0163, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0108, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0102, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0088, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0086, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0131, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0144, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0144, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0083, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0199, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0115, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0092, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0081, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0095, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0102, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0088, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0105, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0117, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0083, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0086, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0170, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0097, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0166, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0101, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0124, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0118, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0093, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0092, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0114, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0103, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0110, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0087, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0109, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0125, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0098, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0088, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0080, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0163, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0024, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0132, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0097, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0086, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0091, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0139, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0189, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0096, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0088, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0164, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0083, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0110, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0080, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0086, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0111, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0100, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0119, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0117, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0096, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0082, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0011, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0094, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0093, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0022, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0087, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0113, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0080, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0081, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0125, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0100, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0084, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0097, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0129, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0083, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0080, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0106, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0083, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0118, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0106, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0097, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0166, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0083, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0135, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0020, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0092, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0086, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0103, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0120, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0083, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0103, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0142, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0166, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0081, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0124, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0119, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0108, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0179, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0107, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0097, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0118, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0126, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0100, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0018, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0105, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0100, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0105, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0114, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0094, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0089, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0108, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0085, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0081, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0161, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0082, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0137, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0080, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0118, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0146, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0102, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0124, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0147, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0111, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0106, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0095, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0092, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0118, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0084, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0095, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0018, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0124, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0083, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0095, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0109, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0112, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0105, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0084, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0170, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0016, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0092, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0099, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0143, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0087, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0095, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0084, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0023, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0122, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0080, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0089, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0096, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0137, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0118, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0103, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0116, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0015, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0107, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0168, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0019, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0141, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0136, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0088, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0109, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0114, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0092, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0101, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0186, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0089, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0103, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0020, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0092, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0100, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0096, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0160, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0084, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0105, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0137, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0084, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0082, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0085, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0094, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0020, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0124, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0102, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0082, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0149, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0126, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0084, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0111, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0154, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0085, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0114, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0113, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0095, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0130, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0095, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0085, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0114, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0023, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0089, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0081, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0106, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0141, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0111, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0114, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0098, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0083, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0019, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0022, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0106, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0124, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0084, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0121, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0080, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0018, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0105, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0097, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0099, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0097, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0086, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0083, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0088, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0090, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0101, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0107, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0082, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0097, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0117, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0116, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0141, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0089, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0099, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0108, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0116, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0099, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0126, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0096, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0106, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0111, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0103, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0019, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0101, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0148, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0140, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0248, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0009, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0082, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0082, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0117, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0097, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0085, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0090, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0094, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0118, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0139, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0106, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0091, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0164, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0213, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0171, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0015, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0144, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0083, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0083, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0143, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0096, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0090, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0102, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0083, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0137, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0103, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0089, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0212, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0121, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0097, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0090, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0122, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0122, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0137, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0086, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0136, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0081, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0098, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0160, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0118, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0090, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0017, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0024, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0085, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0101, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0099, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0112, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0093, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0103, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0148, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0102, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0091, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0119, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0115, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0149, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0125, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0137, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0094, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0208, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0086, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0105, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0023, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0110, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0095, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0017, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0024, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0095, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0107, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0123, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0090, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0098, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0099, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0086, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0080, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0132, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0086, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0087, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0105, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0115, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0120, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0009, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0098, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0083, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0086, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0123, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0106, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0111, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0015, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0019, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0158, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0122, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0169, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0117, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0096, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0101, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0136, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0108, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0109, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0113, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0141, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0147, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0011, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0140, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0106, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0020, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0020, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0166, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0101, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0092, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0137, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0102, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0100, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0082, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0171, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0112, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0099, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0087, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0024, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0094, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0021, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0176, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0175, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0094, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0134, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0085, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0107, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0124, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0107, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0082, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0116, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0093, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0153, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0083, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0093, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0009, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0114, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0098, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0083, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0136, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0024, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0136, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0140, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0130, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0126, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0020, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0090, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0085, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0090, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0200, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0103, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0114, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0106, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0277, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0096, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0101, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0114, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0095, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0151, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0092, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0114, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0121, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0017, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0022, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0094, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0107, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0095, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0100, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0121, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0119, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0111, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0161, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0018, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0098, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0099, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0019, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0088, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0109, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0127, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0093, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0098, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0087, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0143, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0100, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0085, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0102, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0148, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0083, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0083, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0093, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0123, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0128, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0097, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0145, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0090, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0093, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0122, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0161, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0112, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0087, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0082, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0110, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0269, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0088, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0167, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0156, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0090, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0259, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0131, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0143, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0100, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0080, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0111, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0090, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0091, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0083, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0157, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0130, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0080, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0116, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0085, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0099, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0089, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0100, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0132, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0082, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0142, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0093, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0087, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0100, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0022, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0106, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0116, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0143, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0107, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0014, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0085, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0146, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0105, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0096, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0120, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0098, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0100, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0103, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0149, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0092, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0133, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0089, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0128, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0109, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0238, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0150, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0082, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0117, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0158, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0083, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0096, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0092, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0150, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0179, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0113, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0084, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0082, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0124, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0092, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0117, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0085, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0100, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0084, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0085, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0082, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0143, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0087, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0168, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0091, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0023, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0092, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0083, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0089, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0100, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0091, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0154, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0011, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0133, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0130, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0117, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0082, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0146, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0116, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0141, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0155, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0090, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0087, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0085, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0088, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0125, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0144, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0120, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0085, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0138, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0087, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0173, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0145, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0082, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0108, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0090, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0105, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0154, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0081, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0098, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0196, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0099, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0082, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0115, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0099, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0127, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0121, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0162, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0127, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0173, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0088, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0128, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0106, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0112, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0152, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0015, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0092, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0083, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0142, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0222, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0139, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0139, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0197, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0210, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0081, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0020, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0093, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0018, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0215, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0106, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0133, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0004, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0103, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0090, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0010, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0089, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0107, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0111, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0152, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0097, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0124, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0020, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0126, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0090, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0143, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0157, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0087, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0143, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0135, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0008, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0022, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0080, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0086, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0142, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0018, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0086, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0110, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0131, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0084, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0013, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0097, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0148, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0084, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0097, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0111, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0102, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0164, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0087, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0127, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0100, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0017, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0089, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0125, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0083, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0146, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0091, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0132, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0089, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0100, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0110, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0013, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0094, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0093, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0106, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0087, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0099, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0112, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0091, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0098, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0096, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0091, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0142, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0102, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0090, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0010, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0140, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0098, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0018, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0088, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0113, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0080, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0149, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0116, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0128, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0102, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0114, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0107, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0104, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0114, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0106, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0087, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0143, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0019, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0125, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0233, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0090, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0124, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0126, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0093, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0094, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0088, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0116, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0138, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0085, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0097, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0228, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0083, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0111, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0019, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0111, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0103, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0084, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0105, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0085, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0095, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0087, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0024, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0016, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0186, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0089, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0118, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0172, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0168, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0092, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0021, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0143, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0110, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0111, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0111, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0080, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0169, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0125, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0102, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0103, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0080, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0155, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0176, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0141, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0084, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0107, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0172, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0129, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0114, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0141, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0112, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0085, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0103, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0084, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0208, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0013, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0087, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0116, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0094, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0142, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0116, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0144, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0119, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0080, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0087, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0142, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0020, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0082, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0200, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0218, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0082, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0092, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0083, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0104, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0092, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0100, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0080, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0083, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0152, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0135, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0099, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0082, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0124, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0120, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0113, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0096, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0125, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0103, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0112, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0084, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0140, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0080, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0131, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0120, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0082, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0019, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0089, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0087, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0142, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0113, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0089, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0139, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0154, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0134, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0023, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0108, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0022, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0093, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0115, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0083, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0098, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0134, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0087, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0125, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0086, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0093, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0084, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0123, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0139, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0092, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0137, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0093, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0088, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0083, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0088, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0183, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0080, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0097, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0093, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0096, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0120, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0117, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0100, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0023, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0110, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0138, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0096, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0129, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0095, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0085, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0095, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0102, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0087, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0103, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0132, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0124, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0084, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0088, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0012, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0022, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0127, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0101, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0084, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0091, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0097, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0122, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0020, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0017, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0122, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0137, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0024, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0088, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0019, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0087, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0081, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0131, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0142, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0022, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0105, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0120, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0099, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0098, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0128, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0200, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0119, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0169, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0102, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0111, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0109, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0020, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0204, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0086, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0014, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0105, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0095, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0181, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0020, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0023, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0121, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0107, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0135, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0161, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0086, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0108, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0155, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0090, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0085, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0080, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0105, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0082, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0139, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0212, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0103, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0109, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0109, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0085, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0086, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0084, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0097, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0084, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0080, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0083, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0125, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0137, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0120, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0080, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0106, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0101, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0187, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0119, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0016, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0142, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0123, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0134, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0099, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0016, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0088, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0089, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0108, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0011, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0125, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0108, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0083, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0130, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0110, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0111, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0116, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0080, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0161, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0179, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0098, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0147, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0123, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0133, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0102, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0097, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0085, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0139, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0134, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0119, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0123, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0097, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0149, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0084, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0085, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0117, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0115, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0103, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0086, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0107, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0106, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0088, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0082, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0083, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0107, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0113, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0098, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0017, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0087, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0136, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0012, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0092, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0099, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0088, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0105, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0083, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0024, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0082, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0080, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0081, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0179, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0117, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0111, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0108, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0090, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0121, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0023, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0119, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0131, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0154, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0112, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0099, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0081, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0089, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0091, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0097, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0124, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0021, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0136, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0136, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0023, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0084, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0184, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0083, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0081, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0020, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0095, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0108, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0089, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0148, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0019, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0110, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0125, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0135, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0094, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0109, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0088, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0118, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0095, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0121, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0115, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0080, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0102, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0122, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0113, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0156, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0110, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0161, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0134, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0145, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0126, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0132, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0020, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0130, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0144, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0081, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0023, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0107, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0019, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0111, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0084, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0114, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0109, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0099, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0142, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0086, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0124, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0130, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0140, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0088, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0113, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0087, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0121, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0150, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0121, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0082, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0021, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0144, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0081, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0182, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0094, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0125, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0088, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0099, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0085, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0142, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0088, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0082, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0015, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0085, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0012, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0104, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0131, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0202, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0103, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0119, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0082, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0124, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0105, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0130, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0013, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0126, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0101, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0024, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0096, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0166, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0081, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0105, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0119, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0081, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0103, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0097, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0081, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0114, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0088, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0122, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0018, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0135, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0138, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0097, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0175, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0106, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0092, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0096, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0099, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0115, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0087, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0123, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0120, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0105, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0120, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0193, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0023, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0135, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0085, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0086, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0136, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0096, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0081, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0105, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0108, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0122, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0083, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0085, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0112, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0095, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0106, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0103, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0111, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0088, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0089, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0105, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0089, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0213, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0089, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0173, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0091, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0086, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0091, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0141, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0159, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0090, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0122, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0084, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0133, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0094, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0115, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0090, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0102, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0093, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0085, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0119, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0141, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0083, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0090, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0083, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0080, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0095, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0095, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0128, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0092, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0084, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0086, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0085, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0081, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0081, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0145, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0092, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0203, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0086, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0080, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0152, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0117, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0121, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0119, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0101, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0128, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0085, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0080, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0152, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0118, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0087, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0118, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0098, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0122, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0086, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0147, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0231, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0087, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0135, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0083, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0090, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0089, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0091, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0096, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0085, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0143, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0128, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0099, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0085, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0136, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0122, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0171, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0134, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0131, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0135, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0086, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0104, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0141, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0101, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0145, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0099, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0111, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0121, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0131, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0156, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0096, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0119, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0085, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0023, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0097, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0132, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0085, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0084, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0113, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0083, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0122, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0137, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0128, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0173, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0088, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0088, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0121, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0122, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0088, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0102, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0021, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0093, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0113, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0113, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0019, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0101, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0009, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0093, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0081, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0085, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0125, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0141, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0101, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0095, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0083, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0082, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0123, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0150, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0118, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0159, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0086, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0096, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0091, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0137, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0100, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0087, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0101, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0023, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0190, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0127, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0113, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0111, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0098, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0143, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0094, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0121, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0107, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0114, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0082, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0126, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0023, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0080, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0143, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0130, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0103, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0082, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0086, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0146, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0111, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0121, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0081, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0141, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0116, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0095, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0109, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0084, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0139, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0096, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0141, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0123, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0113, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0097, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0100, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0104, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0102, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0085, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0099, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0131, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0117, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0088, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0092, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0096, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0085, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0102, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0109, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0085, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0083, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0142, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0138, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0121, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0086, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0101, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0082, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0094, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0132, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0082, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0085, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0103, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0107, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0120, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0122, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0159, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0098, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0138, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0090, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0080, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0093, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0107, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0103, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0088, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0089, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0097, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0083, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0174, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0018, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0106, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0114, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0087, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0083, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0109, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0018, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0094, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0143, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0083, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0022, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0126, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0081, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0097, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0158, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0104, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0094, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0090, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0105, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0102, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0083, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0142, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0085, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0091, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0087, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0124, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0092, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0152, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0148, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0083, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0085, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0096, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0109, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0102, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0023, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0091, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0141, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0114, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0114, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0103, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0180, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0101, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0024, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0116, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0087, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0141, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0117, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0082, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0116, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0085, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0081, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0145, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0020, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0119, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0122, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0102, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0129, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0110, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0084, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0119, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0110, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0087, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0123, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0108, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0116, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0100, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0088, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0097, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0086, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0093, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0086, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0080, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0096, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0132, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0137, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0109, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0129, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0080, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0106, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0141, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0082, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0126, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0127, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0133, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0088, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0161, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0086, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0019, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0155, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0090, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0116, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0133, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0082, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0116, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0088, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0143, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0151, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0085, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0098, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0110, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0094, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0103, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0117, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0106, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0081, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0113, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0156, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0085, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0115, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0115, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0127, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0087, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0096, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0097, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0183, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0131, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0139, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0091, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0111, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0094, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0083, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0162, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0124, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0084, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0099, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0102, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0093, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0100, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0107, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0119, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0082, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0095, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0095, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0013, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0020, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0097, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0024, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0138, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0104, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0122, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0084, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0022, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0116, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0095, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0130, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0083, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0086, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0089, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0109, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0119, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0133, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0109, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0095, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0158, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0087, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0085, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0178, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0105, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0116, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0082, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0125, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0101, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0117, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0084, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0089, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0084, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0109, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0088, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0106, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0093, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0084, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0109, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0083, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0126, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0117, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0082, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0085, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0089, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0113, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0110, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0153, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0103, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0083, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0084, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0134, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0190, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0105, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0010, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0096, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0024, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0093, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0091, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0014, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0086, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0124, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0085, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0111, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0090, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0101, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0097, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0106, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0111, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0131, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0084, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0189, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0110, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0181, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0083, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0144, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0100, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0083, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0100, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0095, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0089, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0108, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0089, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0080, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0086, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0107, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0090, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0116, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0228, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0086, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0108, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0090, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0103, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0083, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0090, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0132, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0182, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0150, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0113, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0096, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0090, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0104, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0133, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0089, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0172, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0083, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0104, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0085, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0089, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0085, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0092, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0093, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0094, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0023, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0173, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0158, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0114, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0107, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0116, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0144, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0113, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0098, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0153, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0113, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0109, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0087, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0092, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0097, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0095, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0160, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0084, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0094, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0171, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0131, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0141, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0084, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0088, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0082, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0090, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0113, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0091, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0086, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0085, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0119, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0019, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0096, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0105, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0101, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0148, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0126, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0019, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0169, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0120, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0156, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0138, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0085, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0120, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0092, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0109, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0110, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0120, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0111, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0022, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0103, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0097, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0094, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0121, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0085, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0102, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0130, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0177, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0111, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0094, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0131, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0088, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0106, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0132, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0129, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0088, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0131, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0084, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0097, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0100, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0093, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0118, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0170, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0147, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0151, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0087, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0104, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0093, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0108, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0019, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0108, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0081, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0097, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0099, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0110, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0100, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0100, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0094, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0086, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0090, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0083, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0117, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0097, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0113, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0182, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0095, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0103, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0119, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0121, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0111, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0084, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0080, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0016, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0082, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0129, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0096, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0094, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0093, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0083, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0134, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0086, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0086, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0090, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0101, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0081, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0089, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0135, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0148, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0117, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0118, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0093, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0109, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0121, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0154, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0201, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0087, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0090, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0090, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0103, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0083, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0089, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0089, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0022, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0086, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0183, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0084, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0156, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0175, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0133, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0086, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0138, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0123, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0080, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0105, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0103, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0085, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0109, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0101, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0090, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0164, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0081, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0103, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0144, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0159, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0113, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0096, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0149, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0103, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0112, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0090, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0145, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0125, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0101, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0096, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0092, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0112, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0120, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0093, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0021, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0087, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0019, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0112, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0104, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0088, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0196, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0125, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0081, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0105, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0115, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0093, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0101, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0127, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0086, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0096, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0130, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0018, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0133, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0101, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0110, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0084, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0084, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0111, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0114, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0085, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0023, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0018, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0199, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0130, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0150, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0106, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0107, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0217, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0098, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0125, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0090, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0090, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0133, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0113, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0144, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0151, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0095, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0124, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0108, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0135, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0089, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0141, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0112, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0098, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0120, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0086, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0102, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0082, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0085, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0088, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0129, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0089, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0084, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0084, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0084, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0111, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0101, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0098, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0140, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0130, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0083, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0118, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0084, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0088, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0091, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0101, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0102, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0118, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0115, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0157, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0115, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0091, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0115, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0114, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0094, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0101, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0083, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0017, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0113, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0085, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0108, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0153, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0087, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0118, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0181, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0117, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0140, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0110, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0114, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0083, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0150, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0096, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0085, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0145, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0111, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0124, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0093, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0123, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0080, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0117, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0142, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0116, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0094, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0085, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0018, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0097, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0017, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0136, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0014, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0116, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0127, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0151, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0149, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0081, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0023, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0089, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0110, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0129, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0088, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0086, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0089, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0095, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0109, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0126, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0018, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0018, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0023, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0083, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0083, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0138, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0124, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0103, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0082, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0109, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0088, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0177, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0116, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0111, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0103, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0100, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0017, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0024, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0085, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0165, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0146, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0022, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0087, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0018, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0019, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0084, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0095, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0095, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0113, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0092, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0017, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0106, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0081, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0120, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0085, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0143, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0084, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0082, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0110, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0127, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0082, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0115, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0105, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0084, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0097, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0085, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0164, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0112, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0133, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0080, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0088, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0123, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0098, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0111, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0091, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0111, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0021, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0118, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0117, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0081, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0081, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0095, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0096, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0101, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0168, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0108, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0019, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0119, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0098, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0095, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0117, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0109, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0118, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0089, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0080, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0143, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0106, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0098, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0092, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0012, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0124, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0098, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0119, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0102, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0021, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0098, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0111, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0100, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0101, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0109, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0127, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0084, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0088, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0163, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0020, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0116, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0133, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0092, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0110, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0187, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0101, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0112, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0104, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0087, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0136, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0104, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0129, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0104, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0140, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0086, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0089, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0084, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0088, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0084, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0108, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0085, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0023, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0097, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0128, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0092, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0103, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0083, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0023, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0119, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0109, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0139, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0092, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0105, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0098, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0086, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0103, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0112, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0110, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0120, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0085, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0117, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0096, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0114, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0086, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0089, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0164, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0016, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0086, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0081, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0154, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0081, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0136, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0017, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0093, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0096, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0080, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0128, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0084, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0080, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0088, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0114, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0023, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0093, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0089, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0084, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0114, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0103, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0102, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0105, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0127, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0022, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0124, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0125, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0098, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0096, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0201, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0121, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0170, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0121, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0091, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0085, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0098, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0097, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0166, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0102, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0099, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0083, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0024, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0141, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0163, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0116, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0114, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0098, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0103, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0131, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0085, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0108, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0105, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0084, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0104, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0105, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0096, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0090, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0106, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0112, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0090, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0084, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0138, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0013, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0080, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0090, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0081, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0016, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0014, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0195, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0161, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0216, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0094, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0100, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0097, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0108, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0020, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0095, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0096, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0020, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0081, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0120, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0186, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0088, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0084, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0113, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0149, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0095, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0022, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0021, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0080, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0147, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0100, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0120, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0101, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0101, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0104, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0090, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0112, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0123, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0116, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0105, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0084, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0119, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0094, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0124, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0081, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0122, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0111, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0084, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0083, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0102, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0020, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0091, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0091, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0096, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0088, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0114, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0094, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0156, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0107, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0097, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0130, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0096, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0117, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0106, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0106, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0110, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0020, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0024, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0143, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0089, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0119, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0093, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0089, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0113, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0147, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0084, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0080, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0099, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0081, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0080, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0171, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0172, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0099, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0099, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0086, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0081, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0089, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0102, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0081, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0023, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0092, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0089, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0100, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0105, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0101, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0022, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0094, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0080, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0084, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0222, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0116, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0086, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0018, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0132, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0099, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0097, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0094, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0101, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0112, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0088, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0081, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0108, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0080, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0215, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0120, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0087, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0089, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0095, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0115, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0081, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0084, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0104, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0136, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0092, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0099, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0111, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0113, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0082, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0135, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0137, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0096, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0091, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0096, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0108, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0092, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0087, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0127, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0096, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0105, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0080, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0088, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0103, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0084, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0081, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0095, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0086, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0088, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0122, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0095, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0160, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0080, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0121, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0125, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0093, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0082, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0100, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0145, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0099, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0086, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0018, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0085, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0145, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0132, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0131, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0098, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0024, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0103, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0101, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0023, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0083, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0107, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0127, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0101, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0093, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0184, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0103, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0093, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0128, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0099, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0129, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0087, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0086, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0091, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0085, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0084, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0081, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0100, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0101, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0099, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0117, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0111, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0097, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0171, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0101, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0145, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0105, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0143, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0103, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0098, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0022, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0094, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0189, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0113, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0082, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0087, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0024, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0085, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0133, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0105, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0099, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0097, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0142, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0112, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0125, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0126, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0145, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0131, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0109, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0153, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0134, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0090, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0090, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0101, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0101, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0107, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0100, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0096, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0133, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0024, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0083, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0088, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0111, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0114, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0144, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0169, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0087, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0123, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0123, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0170, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0093, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0088, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0080, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0108, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0149, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0094, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0101, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0101, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0115, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0098, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0122, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0098, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0135, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0126, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0083, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0083, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0084, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0096, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0083, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0024, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0135, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0106, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0110, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0089, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0024, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0093, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0080, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0100, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0162, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0018, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0082, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0087, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0089, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0091, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0098, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0019, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0183, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0085, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0089, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0122, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0086, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0086, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0132, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0092, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0022, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0118, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0085, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0100, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0013, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0111, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0023, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0011, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0083, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0084, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0162, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0127, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0084, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0084, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0109, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0123, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0104, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0093, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0110, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0123, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0086, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0105, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0098, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0106, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0118, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0118, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0121, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0096, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0083, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0111, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0128, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0092, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0113, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0118, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0128, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0119, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0024, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0097, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0122, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0108, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0171, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0092, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0128, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0137, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0097, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0091, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0095, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0105, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0088, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0109, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0084, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0083, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0089, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0135, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0144, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0122, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0111, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0130, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0090, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0154, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0090, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0126, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0125, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0110, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0095, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0084, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0088, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0083, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0085, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0131, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0138, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0127, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0096, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0138, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0141, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0093, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0140, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0022, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0107, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0157, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0114, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0106, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0124, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0097, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0096, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0109, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0135, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0096, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0084, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0132, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0097, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0095, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0084, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0104, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0097, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0103, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0134, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0084, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0019, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0102, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0090, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0095, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0105, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0159, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0098, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0083, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0117, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0089, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0110, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0080, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0081, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0087, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0022, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0098, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0122, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0108, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0111, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0022, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0081, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0122, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0096, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0148, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0118, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0144, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0109, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0111, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0087, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0080, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0096, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0106, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0084, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0130, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0024, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0081, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0104, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0114, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0101, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0017, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0152, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0081, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0128, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0100, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0107, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0021, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0149, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0080, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0110, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0021, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0081, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0084, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0024, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0127, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0142, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0109, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0097, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0113, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0108, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0107, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0116, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0121, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0080, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0088, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0097, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0093, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0101, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0127, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0090, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0022, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0131, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0099, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0162, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0081, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0104, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0087, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0084, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0126, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0090, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0146, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0012, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0115, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0136, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0089, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0090, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0151, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0119, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0127, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0097, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0117, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0151, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0210, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0115, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0144, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0160, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0112, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0092, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0141, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0090, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0090, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0127, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0167, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0098, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0159, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0088, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0085, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0152, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0015, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0100, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0083, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0140, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0090, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0099, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0120, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0130, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0091, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0087, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0020, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0023, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0091, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0090, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0017, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0104, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0127, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0085, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0081, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0118, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0093, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0022, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0080, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0085, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0098, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0082, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0092, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0022, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0081, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0085, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0082, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0104, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0127, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0102, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0111, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0094, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0090, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0093, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0125, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0117, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0102, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0021, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0096, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0024, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0085, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0145, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0098, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0114, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0100, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0087, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0020, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0157, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0085, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0024, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0086, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0139, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0130, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0118, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0081, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0174, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0080, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0106, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0104, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0014, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0084, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0087, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0086, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0120, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0098, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0096, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0091, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0096, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0091, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0082, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0094, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0080, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0097, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0080, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0116, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0084, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0147, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0016, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0107, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0126, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0206, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0120, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0104, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0127, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0139, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0014, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0143, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0116, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0082, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0091, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0092, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0019, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0010, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0084, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0121, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0086, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0092, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0111, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0095, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0110, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0116, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0128, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0120, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0086, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0085, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0143, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0090, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0090, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0091, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0013, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0024, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0024, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0086, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0108, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0092, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0104, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0099, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0114, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0107, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0103, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0126, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0131, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0095, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0108, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0115, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0114, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0093, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0142, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0104, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0130, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0114, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0093, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0161, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0119, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0097, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0012, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0090, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0104, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0082, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0104, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0174, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0080, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0080, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0096, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0119, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0187, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0080, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0135, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0090, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0085, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0093, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0165, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0090, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0124, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0115, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0080, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0084, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0124, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0106, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0103, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0011, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0089, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0090, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0091, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0017, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0087, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0091, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0103, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0105, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0167, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0024, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0091, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0096, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0093, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0083, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0087, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0093, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0083, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0099, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0085, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0084, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0134, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0112, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0135, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0085, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0100, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0114, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0148, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0082, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0080, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0097, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0122, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0154, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0109, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0021, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0122, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0123, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0098, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0096, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0103, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0083, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0178, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0083, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0080, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0097, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0135, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0097, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0016, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0023, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0089, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0094, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0082, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0147, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0116, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0137, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0108, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0093, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0097, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0104, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0100, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0112, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0125, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0015, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0097, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0096, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0090, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0103, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0083, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0156, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0083, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0098, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0099, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0085, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0181, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0086, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0186, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0118, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0098, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0099, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0089, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0130, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0105, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0144, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0088, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0111, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0149, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0139, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0120, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0086, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0109, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0112, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0081, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0096, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0106, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0080, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0113, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0109, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0103, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0111, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0095, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0082, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0098, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0132, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0104, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0094, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0098, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0085, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0094, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0081, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0023, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0084, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0092, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0081, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0131, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0100, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0086, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0103, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0125, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0089, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0084, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0090, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0096, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0147, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0152, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0148, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0161, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0086, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0113, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0019, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0114, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0142, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0099, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0121, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0105, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0080, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0129, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0086, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0159, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0128, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0115, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0101, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0092, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0114, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0110, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0118, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0105, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0097, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0085, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0082, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0109, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0081, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0115, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0238, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0115, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0124, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0093, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0115, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0091, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0169, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0085, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0109, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0126, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0081, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0092, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0166, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0015, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0020, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0112, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0104, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0144, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0130, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0111, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0094, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0093, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0100, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0082, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0133, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0146, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0094, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0132, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0113, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0094, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0087, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0115, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0101, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0114, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0163, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0088, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0094, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0113, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0100, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0090, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0094, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0100, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0084, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0105, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0099, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0090, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0108, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0135, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0135, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0099, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0086, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0110, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0134, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0103, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0080, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0111, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0011, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0131, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0089, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0144, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0088, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0097, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0087, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0082, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0094, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0125, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0093, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0023, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0124, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0081, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0089, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0118, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0113, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0092, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0136, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0097, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0089, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0104, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0112, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0104, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0111, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0085, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0105, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0169, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0116, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0088, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0102, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0114, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0122, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0096, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0103, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0215, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0125, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0112, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0127, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0089, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0106, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0080, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0090, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0087, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0092, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0080, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0131, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0100, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0087, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0123, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0090, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0090, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0097, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0110, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0081, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0088, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0091, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0118, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0117, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0086, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0136, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0110, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0105, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0110, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0098, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0103, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0141, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0101, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0083, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0103, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0083, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0177, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0166, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0100, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0081, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0105, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0095, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0092, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0100, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0100, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0112, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0136, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0196, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0127, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0096, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0081, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0083, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0015, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0081, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0017, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0132, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0090, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0120, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0129, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0087, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0124, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0082, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0088, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0020, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0084, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0094, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0095, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0120, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0115, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0134, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0087, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0024, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0101, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0095, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0134, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0102, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0109, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0099, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0134, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0087, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0021, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0100, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0092, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0083, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0151, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0118, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0095, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0109, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0009, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0113, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0013, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0181, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0024, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0161, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0102, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0134, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0126, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0080, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0118, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0098, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0105, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0081, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0150, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0020, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0096, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0109, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0085, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0219, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0104, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0122, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0105, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0118, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0115, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0092, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0088, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0100, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0102, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0113, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0020, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0119, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0095, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0089, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0113, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0083, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0162, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0087, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0100, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0122, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0089, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0131, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0109, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0138, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0134, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0023, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0081, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0116, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0082, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0085, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0101, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0080, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0012, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0084, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0151, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0109, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0097, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0149, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0090, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0097, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0157, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0134, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0149, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0100, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0137, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0109, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0083, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0087, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0106, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0095, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0118, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0125, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0085, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0089, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0163, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0143, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0106, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0122, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0121, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0084, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0089, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0097, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0102, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0105, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0189, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0125, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0084, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0124, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0082, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0099, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0090, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0011, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0084, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0099, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0082, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0105, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0115, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0123, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0130, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0129, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0094, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0086, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0106, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0105, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0127, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0024, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0155, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0087, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0096, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0024, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0105, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0127, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0088, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0083, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0097, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0089, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0016, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0011, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0105, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0104, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0102, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0113, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0104, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0115, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0097, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0103, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0105, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0100, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0088, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0150, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0095, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0136, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0166, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0091, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0106, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0088, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0085, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0107, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0085, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0122, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0159, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0024, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0089, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0124, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0101, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0080, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0088, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0122, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0153, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0024, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0103, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0083, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0097, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0154, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0086, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0095, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0080, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0081, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0111, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0115, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0008, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0098, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0121, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0093, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0124, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0153, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0084, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0133, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0095, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0093, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0093, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0089, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0114, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0099, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0123, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0103, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0082, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0112, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0110, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0121, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0114, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0083, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0019, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0124, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0127, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0090, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0103, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0086, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0103, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0083, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0086, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0090, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0133, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0089, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0108, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0145, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0157, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0122, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0109, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0109, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0090, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0093, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0164, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0170, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0128, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0101, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0113, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0088, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0132, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0084, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0086, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0172, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0104, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0098, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0086, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0084, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0108, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0082, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0101, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0154, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0173, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0104, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0138, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0091, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0084, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0117, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0120, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0102, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0100, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0103, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0102, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0092, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0021, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0081, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0126, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0124, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0124, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0089, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0097, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0093, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0095, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0136, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0092, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0087, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0018, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0104, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0119, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0121, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0119, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0087, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0018, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0099, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0093, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0084, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0099, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0149, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0112, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0021, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0103, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0086, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0094, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0144, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0086, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0092, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0100, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0080, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0087, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0112, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0080, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0083, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0087, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0086, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0082, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0131, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0085, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0122, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0156, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0099, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0150, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0098, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0110, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0088, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0111, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0105, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0143, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0124, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0101, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0123, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0130, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0137, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0081, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0138, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0085, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0104, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0106, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0121, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0020, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0100, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0100, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0020, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0083, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0128, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0016, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0097, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0090, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0121, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0087, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0134, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0088, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0126, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0108, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0107, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0090, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0086, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0092, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0153, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0111, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0081, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0081, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0081, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0101, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0097, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0114, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0111, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0085, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0095, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0092, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0097, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0133, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0021, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0023, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0088, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0105, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0096, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0019, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0090, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0106, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0091, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0121, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0093, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0015, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0087, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0119, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0015, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0173, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0097, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0124, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0089, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0188, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0173, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0080, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0118, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0093, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0089, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0108, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0083, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0089, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0081, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0082, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0087, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0125, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0120, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0155, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0148, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0112, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0082, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0090, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0023, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0105, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0096, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0085, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0092, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0082, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0150, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0098, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0084, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0090, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0103, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0019, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0082, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0103, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0022, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0094, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0105, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0093, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0110, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0089, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0108, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0018, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0090, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0106, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0081, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0128, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0087, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0108, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0023, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0086, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0111, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0080, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0086, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0008, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0094, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0098, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0110, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0097, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0091, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0153, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0174, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0088, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0086, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0084, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0092, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0108, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0101, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0125, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0112, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0082, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0092, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0081, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0084, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0094, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0145, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0127, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0023, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0104, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0088, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0103, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0095, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0096, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0082, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0099, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0101, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0093, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0080, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0089, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0122, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0083, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0019, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0115, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0108, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0098, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0089, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0106, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0098, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0020, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0134, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0085, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0021, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0089, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0020, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0090, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0123, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0095, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0103, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0087, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0105, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0130, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0096, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0114, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0014, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0148, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0110, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0116, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0086, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0094, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0124, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0137, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0080, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0116, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0023, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0160, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0133, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0102, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0016, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0162, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0120, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0096, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0139, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0106, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0144, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0080, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0097, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0094, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0093, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0095, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0133, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0116, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0082, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0100, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0099, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0095, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0090, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0093, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0085, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0123, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0087, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0106, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0124, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0089, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0158, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0082, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0087, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0161, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0080, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0099, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0112, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0093, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0090, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0083, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0094, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0098, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0148, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0183, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0024, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0103, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0089, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0152, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0154, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0090, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0021, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0021, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0093, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0096, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0105, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0113, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0111, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0106, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0090, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0138, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0080, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0104, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0024, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0021, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0097, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0123, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0085, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0121, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0019, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0096, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0085, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0085, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0024, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0086, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0081, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0081, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0017, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0022, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0108, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0082, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0135, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0081, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0120, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0087, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0021, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0089, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0102, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0112, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0117, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0120, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0024, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0117, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0116, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0081, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0127, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0092, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0115, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0082, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0163, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0127, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0102, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0095, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0086, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0019, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0095, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0111, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0098, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0022, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0149, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0080, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0092, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0090, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0096, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0081, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0081, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0082, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0093, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0105, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0107, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0017, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0085, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0129, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0081, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0080, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0099, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0086, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0126, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0088, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0014, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0019, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0016, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0099, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0081, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0095, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0086, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0100, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0123, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0080, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0104, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0084, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0087, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0082, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0138, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0091, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0103, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0198, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0100, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0086, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0087, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0095, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0088, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0110, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0109, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0181, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0127, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0114, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0022, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0024, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0097, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0098, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0093, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0123, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0014, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0159, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0091, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0118, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0085, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0096, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0095, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0167, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0080, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0094, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0096, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0023, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0104, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0113, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0100, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0084, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0081, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0082, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0105, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0159, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0131, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0081, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0089, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0107, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0110, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0093, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0104, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0093, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0150, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0082, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0017, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0143, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0009, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0109, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0109, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0024, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0087, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0087, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0089, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0087, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0126, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0126, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0022, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0104, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0113, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0082, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0094, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0110, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0141, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0112, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0115, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0103, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0097, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0101, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0008, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0101, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0086, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0107, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0015, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0122, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0109, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0009, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0133, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0081, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0095, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0022, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0023, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0105, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0110, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0118, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0102, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0085, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0021, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0095, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0119, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0099, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0110, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0104, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0114, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0117, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0118, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0093, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0098, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0089, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0095, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0101, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0085, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0141, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0084, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0129, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0082, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0133, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0098, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0153, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0022, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0080, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0083, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0090, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0101, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0081, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0113, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0087, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0085, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0101, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0089, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0088, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0113, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0082, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0090, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0094, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0117, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0097, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0148, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0117, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0119, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0141, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0081, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0109, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0092, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0096, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0086, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0082, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0007, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0085, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0111, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0140, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0107, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0023, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0088, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0112, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0123, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0096, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0019, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0014, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0103, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0084, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0090, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0085, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0108, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0135, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0111, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0114, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0083, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0022, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0024, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0089, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0087, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0100, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0102, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0104, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0023, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0080, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0117, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0107, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0015, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0157, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0086, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0127, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0099, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0112, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0023, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0103, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0110, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0106, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0023, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0100, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0099, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0082, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0091, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0110, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0095, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0093, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0084, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0145, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0019, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0127, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0109, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0083, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0096, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0017, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0108, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0087, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0023, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0096, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0080, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0082, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0084, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0127, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0086, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0023, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0100, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0160, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0094, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0156, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0101, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0106, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0016, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0084, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0087, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0024, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0017, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0080, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0119, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0096, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0111, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0087, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0081, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0130, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0004, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0004, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0004, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0004, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0004, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0004, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0004, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0004, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0004, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0004, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0004, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0004, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0004, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0004, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0004, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0004, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0004, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0004, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0007, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0138, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0086, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0108, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0122, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0018, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0086, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0106, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0091, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0105, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0118, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0109, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0091, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0097, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0080, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0086, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0100, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0087, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0087, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0124, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0103, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0081, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0104, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0112, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0081, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0091, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0100, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0097, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0098, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0096, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0012, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0094, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0090, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0018, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0122, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0109, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0100, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0085, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0023, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0095, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0090, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0113, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0101, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0087, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0103, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0083, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0082, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0096, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0093, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0114, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0082, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0103, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0086, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0084, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0087, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0099, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0121, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0113, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0020, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0148, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0090, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0117, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0107, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0093, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0171, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0086, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0017, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0097, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0086, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0114, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0105, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0081, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0102, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0014, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0083, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0087, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0162, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0139, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0164, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0105, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0080, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0021, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0118, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0083, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0019, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0087, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0082, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0189, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0121, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0097, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0011, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0093, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0085, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0083, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0094, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0088, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0203, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0147, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0086, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0081, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0091, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0081, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0015, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0084, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0105, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0087, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0121, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0141, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0122, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0019, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0134, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0104, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0086, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0104, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0092, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0120, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0086, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0094, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0020, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0092, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0124, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0086, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0024, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0089, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0081, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0022, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0080, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0082, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0094, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0082, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0133, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0089, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0100, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0023, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0084, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0120, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0102, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0083, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0097, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0084, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0124, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0019, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0121, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0098, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0165, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0087, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0099, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0114, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0148, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0094, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0102, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0081, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0014, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0093, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0086, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0094, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0099, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0092, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0094, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0094, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0083, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0114, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0103, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0089, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0080, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0096, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0024, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0157, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0106, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0022, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0086, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0107, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0098, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0081, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0116, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0084, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0088, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0010, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0012, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0111, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0022, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0024, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0095, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0116, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0083, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0082, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0124, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0014, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0100, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0121, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0094, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0090, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0021, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0119, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0083, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0089, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0167, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0107, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0085, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0020, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0108, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0093, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0085, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0184, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0100, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0022, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0020, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0128, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0100, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0086, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0169, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0008, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0092, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0137, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0121, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0095, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0098, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0081, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0200, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0093, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0095, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0103, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0088, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0083, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0083, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0082, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0084, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0107, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0088, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0104, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0096, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0022, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0085, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0096, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0100, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0090, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0092, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0120, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0126, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0120, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0094, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0104, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0112, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0109, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0022, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0113, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0009, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0089, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0087, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0013, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0014, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0096, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0092, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0095, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0091, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0102, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0104, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0024, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0162, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0092, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0023, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0088, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0133, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0093, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0094, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0089, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0115, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0090, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0019, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0121, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0011, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0127, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0116, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0085, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0097, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0087, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0123, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0083, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0009, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0080, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0104, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0022, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0130, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0022, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0099, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0084, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0098, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0099, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0095, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0115, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0080, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0122, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0095, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0024, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0087, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0099, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0089, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0084, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0080, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0090, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0153, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0011, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0101, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0024, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0012, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0102, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0105, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0081, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0117, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0090, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0099, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0098, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0021, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0102, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0016, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0094, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0108, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0082, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0109, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0082, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0082, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0020, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0129, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0124, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0091, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0107, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0087, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0013, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0016, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0018, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0085, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0171, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0023, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0009, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0022, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0132, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0103, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0107, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0024, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0093, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0097, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0081, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0095, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0022, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0023, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0019, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0019, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0083, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0090, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0112, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0085, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0093, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0112, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0092, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0093, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0107, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0104, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0014, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0156, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0110, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0089, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0098, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0130, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0082, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0133, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0118, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0114, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0119, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0022, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0163, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0101, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0017, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0017, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0016, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0081, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0095, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0114, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0126, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0085, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0087, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0093, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0119, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0126, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0105, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0126, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0097, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0106, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0012, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0104, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0105, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0091, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0106, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0100, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0112, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0115, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0080, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0087, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0024, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0085, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0022, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0097, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0086, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0022, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0090, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0115, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0122, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0089, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0111, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0023, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0083, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0103, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0113, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0085, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0154, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0149, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0084, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0116, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0114, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0081, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0123, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0099, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0098, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0021, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0086, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0080, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0124, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0107, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0088, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0096, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0132, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0021, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0081, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0112, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0089, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0096, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0200, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0128, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0105, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0102, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0126, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0101, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0019, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0090, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0096, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0084, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0162, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0092, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0100, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0103, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0084, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0109, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0013, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0124, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0124, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0098, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0091, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0100, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0101, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0137, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0093, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0095, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0115, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0130, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0094, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0085, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0082, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0111, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0104, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0099, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0023, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0116, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0083, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0096, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0100, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0016, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0109, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0082, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0090, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0109, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0018, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0112, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0086, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0128, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0152, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0113, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0117, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0090, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0138, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0104, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0089, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0125, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0093, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0086, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0157, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0153, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0104, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0084, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0023, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0083, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0082, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0101, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0088, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0087, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0082, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0108, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0154, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0018, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0098, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0087, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0017, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0093, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0090, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0091, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0090, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0202, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0083, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0084, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0109, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0093, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0088, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0081, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0086, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0018, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0088, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0108, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0157, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0082, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0142, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0134, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0157, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0023, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0016, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0081, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0086, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0209, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0106, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0112, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0124, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0102, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0095, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0023, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0202, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0103, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0107, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0113, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0162, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0018, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0114, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0148, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0085, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0090, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0018, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0134, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0087, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0124, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0123, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0095, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0106, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0094, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0080, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0145, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0114, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0082, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0117, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0123, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0020, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0116, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0019, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0013, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0085, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0089, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0128, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0123, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0129, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0088, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0098, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0089, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0087, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0094, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0116, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0106, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0120, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0110, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0087, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0105, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0127, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0115, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0150, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0105, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0129, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0135, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0115, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0192, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0142, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0125, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0126, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0082, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0115, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0020, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0085, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0022, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0014, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0108, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0124, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0098, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0021, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0080, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0102, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0090, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0164, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0081, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0105, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0081, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0021, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0143, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0108, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0129, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0193, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0097, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0182, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0139, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0089, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0136, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0094, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0021, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0087, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0100, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0105, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0092, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0100, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0110, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0105, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0004, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0015, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0109, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0090, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0126, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0113, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0083, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0098, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0018, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0092, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0111, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0110, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0101, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0085, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0023, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Saving model at iteration: 20000
Loss: tensor(0.0116, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0114, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0081, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0177, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0096, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0015, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0097, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0137, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0080, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0130, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0090, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0119, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0093, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0020, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0110, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0113, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0085, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0085, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0107, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0083, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0021, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0106, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0112, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0080, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0090, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0101, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0015, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0091, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0116, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0096, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0017, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0088, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0023, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0095, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0107, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0023, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0091, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0087, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0082, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0110, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0093, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0082, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0083, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0087, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0108, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0138, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0085, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0090, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0093, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0115, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0081, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0016, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0023, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0091, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0116, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0116, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0015, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0095, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0091, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0104, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0139, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0154, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0139, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0094, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0091, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0093, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0099, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0099, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0128, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0141, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0084, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0010, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0102, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0097, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0141, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0018, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0092, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0086, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0090, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0019, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0089, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0117, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0097, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0094, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0016, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0081, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0110, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0080, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0084, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0081, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0083, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0092, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0119, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0082, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0128, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0107, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0084, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0100, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0081, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0102, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0105, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0095, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0095, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0086, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0090, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0097, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0099, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0151, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0089, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0087, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0085, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0131, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0091, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0103, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0101, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0117, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0088, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0115, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0092, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0194, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0102, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0093, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0081, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0088, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0080, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0082, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0090, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0123, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0024, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0165, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0105, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0087, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0083, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0106, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0128, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0094, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0120, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0094, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0116, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0087, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0109, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0119, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0082, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0155, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0097, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0082, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0101, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0120, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0085, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0142, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0161, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0140, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0081, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0109, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0105, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0098, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0160, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0135, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0126, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0167, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0136, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0104, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0086, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0086, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0114, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0023, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0091, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0017, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0115, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0083, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0095, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0138, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0170, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0089, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0229, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0082, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0080, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0085, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0019, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0098, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0105, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0098, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0102, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0024, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0086, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0081, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0095, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0155, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0081, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0102, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0087, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0097, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0085, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0103, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0105, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0105, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0080, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0101, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0091, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0111, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0099, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0104, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0097, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0089, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0020, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0099, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0111, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0093, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0023, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0085, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0085, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0087, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0020, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0084, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0015, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0141, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0105, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0087, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0100, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0094, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0091, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0097, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0105, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0018, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0093, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0115, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0116, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0084, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0097, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0122, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0119, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0090, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0094, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0022, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0080, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0115, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0083, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0112, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0104, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0094, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0135, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0080, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0105, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0080, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0089, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0113, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0094, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0084, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0084, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0093, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0024, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0092, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0086, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0099, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0116, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0094, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0084, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0089, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0124, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0084, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0094, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0095, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0126, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0139, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0088, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0081, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0086, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0125, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0114, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0177, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0095, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0102, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0090, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0024, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0087, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0110, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0016, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0159, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0130, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0108, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0104, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0080, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0096, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0102, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0097, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0111, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0022, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0087, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0085, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0121, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0098, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0095, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0088, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0092, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0100, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0014, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0083, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0081, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0096, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0084, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0096, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0091, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0102, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0080, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0084, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0087, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0021, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0084, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0099, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0122, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0086, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0093, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0020, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0087, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0093, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0092, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0090, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0085, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0113, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0099, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0117, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0144, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0081, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0020, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0088, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0105, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0081, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0118, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0087, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0101, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0134, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0083, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0113, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0106, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0088, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0103, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0084, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0102, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0020, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0101, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0083, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0120, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0126, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0101, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0080, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0103, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0098, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0087, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0100, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0021, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0105, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0095, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0109, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0080, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0096, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0019, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0081, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0118, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0115, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0131, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0082, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0019, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0118, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0135, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0095, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0086, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0090, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0083, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0080, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0023, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0106, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0164, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0104, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0128, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0123, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0150, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0086, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0101, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0096, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0094, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0094, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0110, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0010, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0089, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0023, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0120, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0092, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0087, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0113, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0081, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0146, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0111, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0121, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0024, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0081, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0126, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0091, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0090, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0100, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0120, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0100, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0090, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0021, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0090, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0022, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0089, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0117, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0018, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0118, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0103, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0137, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0105, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0023, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0147, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0095, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0131, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0085, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0097, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0080, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0104, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0089, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0088, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0081, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0105, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0023, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0095, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0112, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0098, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0090, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0083, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0015, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0016, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0115, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0139, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0090, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0086, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0085, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0008, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0093, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0086, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0021, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0147, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0080, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0096, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0016, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0094, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0092, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0096, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0128, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0087, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0101, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0174, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0011, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0114, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0105, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0084, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0112, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0169, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0107, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0113, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0149, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0106, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0104, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0019, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0014, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0095, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0096, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0085, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0100, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0147, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0150, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0023, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0082, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0097, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0187, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0093, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0106, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0146, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0091, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0092, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0120, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0084, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0093, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0126, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0019, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0010, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0145, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0108, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0126, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0085, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0023, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0106, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0136, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0136, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0092, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0016, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0106, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0098, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0126, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0098, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0121, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0127, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0096, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0135, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0124, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0128, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0113, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0093, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0007, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0086, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0091, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0100, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0092, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0014, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0109, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0083, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0085, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0010, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0095, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0006, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0097, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0152, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0112, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0153, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0085, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0089, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0103, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0122, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0155, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0100, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0120, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0024, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0091, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0080, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0118, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0084, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0014, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0098, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0111, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0104, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0082, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0080, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0086, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0151, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0100, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0103, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0103, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0117, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0080, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0118, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0134, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0106, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0023, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0081, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0097, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0087, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0085, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0096, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0090, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0141, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0084, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0124, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0113, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0131, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0091, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0116, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0147, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0109, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0097, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0107, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0089, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0010, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0104, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0021, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0112, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0124, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0104, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0117, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0085, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0089, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0022, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0142, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0124, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0020, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0097, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0091, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0148, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0109, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0094, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0084, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0112, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0106, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0089, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0022, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0088, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0129, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0080, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0163, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0022, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0021, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0002, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0015, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0080, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0111, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0160, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0091, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0080, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0098, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0108, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0087, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0137, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0082, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0023, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0113, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0082, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0104, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0017, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0097, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0088, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0101, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0104, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0086, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0092, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0121, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0023, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0112, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0121, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0016, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0019, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0104, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0098, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0094, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0016, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0135, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0128, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0081, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0019, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0093, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0092, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0023, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0142, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0089, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0083, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0140, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0095, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0084, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0018, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0087, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0081, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0020, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0081, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0128, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0082, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0016, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0092, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0086, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0080, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0091, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0008, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0023, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0088, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0141, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0015, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0113, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0105, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0104, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0095, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0099, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0093, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0108, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0095, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0111, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0088, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0086, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0093, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0007, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0091, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0080, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0107, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0107, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0103, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0089, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0119, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0011, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0084, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0103, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0082, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0140, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0119, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0084, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0115, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0086, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0113, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0021, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0024, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0021, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0085, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0106, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0093, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0132, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0088, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0094, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0085, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0109, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0086, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0111, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0092, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0118, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0113, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0081, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0093, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0089, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0086, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0082, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0115, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0021, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0150, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0100, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0095, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0098, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0095, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0114, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0087, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0087, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0017, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0087, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0103, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0185, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0093, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0121, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0088, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0126, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0020, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0081, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0108, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0101, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0091, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0093, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0115, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0131, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0127, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0096, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0016, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0090, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0020, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0023, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0012, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0132, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0135, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0020, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0159, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0080, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0088, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0099, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0140, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0117, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0013, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0017, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0023, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0096, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0024, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0129, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0021, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0097, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0021, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0098, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0105, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0116, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0103, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0104, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0080, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0024, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0022, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0089, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0010, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0007, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0120, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0109, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0127, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0081, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0083, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0022, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0080, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0096, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0103, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0105, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0123, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0089, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0088, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0123, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0094, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0081, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0080, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0138, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0117, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0099, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0188, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0093, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0158, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0100, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0123, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0088, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0117, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0133, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0105, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0087, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0116, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0155, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0111, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0085, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0158, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0089, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0095, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0127, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0093, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0090, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0083, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0141, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0104, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0183, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0215, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0123, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0096, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0088, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0017, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0014, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0099, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0106, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0109, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0087, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0082, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0081, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0089, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0148, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0093, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0022, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0088, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0107, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0131, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0132, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0133, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0106, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0080, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0085, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0110, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0105, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0097, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0088, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0085, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0129, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0019, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0113, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0100, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0122, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0085, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0116, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0083, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0081, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0094, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0083, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0098, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0085, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0117, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0083, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0103, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0102, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0082, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0081, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0020, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0080, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0089, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0175, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0080, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0086, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0136, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0083, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0087, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0105, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0023, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0081, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0176, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0093, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0092, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0088, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0131, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0113, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0020, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0023, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0089, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0141, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0121, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0097, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0094, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0089, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0097, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0086, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0090, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0080, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0084, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0104, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0091, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0107, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0097, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0102, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0084, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0100, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0102, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0161, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0123, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0081, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0127, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0144, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0143, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0104, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0090, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0087, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0115, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0092, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0089, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0111, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0190, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0087, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0122, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0098, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0080, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0131, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0081, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0081, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0114, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0023, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0082, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0084, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0110, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0149, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0097, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0119, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0110, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0101, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0081, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0087, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0177, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0085, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0087, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0120, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0091, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0080, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0082, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0019, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0081, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0095, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0081, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0019, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0124, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0017, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0136, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0084, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0095, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0082, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0081, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0087, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0082, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0093, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0090, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0088, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0097, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0091, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0019, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0118, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0088, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0139, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0086, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0103, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0096, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0022, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0130, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0098, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0093, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0108, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0101, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0020, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0089, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0080, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0096, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0132, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0124, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0092, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0097, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0121, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0113, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0088, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0149, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0088, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0082, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0096, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0106, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0147, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0102, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0133, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0092, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0101, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0100, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0090, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0087, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0116, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0116, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0091, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0127, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0097, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0122, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0116, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0019, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0084, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0095, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0142, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0126, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0098, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0080, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0085, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0110, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0141, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0097, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0024, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0091, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0101, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0116, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0091, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0094, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0107, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0118, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0022, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0104, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0171, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0097, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0093, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0018, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0099, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0141, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0094, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0087, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0097, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0094, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0082, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0120, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0116, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0092, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0080, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0112, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0098, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0143, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0109, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0104, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0098, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0092, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0087, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0080, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0084, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0098, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0024, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0106, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0082, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0081, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0136, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0086, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0109, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0088, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0085, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0142, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0099, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0153, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0159, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0110, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0113, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0024, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0131, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0114, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0106, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0086, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0113, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0082, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0100, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0093, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0095, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0103, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0082, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0096, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0024, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0106, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0087, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0116, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0160, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0116, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0080, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0080, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0021, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0099, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0096, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0100, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0098, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0103, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0090, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0090, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0092, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0082, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0109, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0022, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0084, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0120, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0090, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0098, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0149, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0093, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0086, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0129, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0100, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0097, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0100, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0023, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0024, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0163, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0014, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0174, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0092, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0121, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0091, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0142, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0102, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0020, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0080, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0090, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0092, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0120, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0097, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0103, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0121, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0108, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0085, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0135, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0153, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0091, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0088, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0122, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0124, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0140, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0124, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0107, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0100, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0017, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0125, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0147, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0094, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0086, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0121, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0020, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0084, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0106, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0137, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0108, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0114, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0088, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0085, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0109, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0084, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0102, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0150, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0109, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0020, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0082, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0124, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0098, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0148, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0098, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0021, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0082, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0131, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0082, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0133, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0021, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0081, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0084, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0112, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0155, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0093, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0132, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0087, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0109, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0087, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0088, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0081, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0116, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0098, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0147, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0100, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0165, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0096, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0085, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0125, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0128, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0104, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0018, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0146, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0081, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0134, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0118, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0085, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0023, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0092, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0088, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0131, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0081, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0086, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0118, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0082, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0092, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0094, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0088, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0082, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0120, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0086, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0023, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0132, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0118, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0087, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0022, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0136, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0085, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0177, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0102, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0092, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0100, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0141, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0102, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0108, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0116, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0014, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0140, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0096, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0081, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0123, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0083, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0103, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0022, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0099, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0096, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0023, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0092, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0092, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0114, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0116, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0088, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0021, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0020, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0086, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0096, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0121, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0014, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0163, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0088, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0022, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0089, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0144, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0087, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0080, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0081, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0154, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0100, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0118, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0115, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0120, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0082, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0084, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0135, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0087, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0100, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0148, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0097, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0146, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0123, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0123, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0093, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0085, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0095, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0084, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0089, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0103, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0099, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0083, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0086, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0099, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0023, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0020, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0098, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0214, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0137, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0117, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0019, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0022, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0088, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0080, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0107, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0114, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0102, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0097, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0081, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0018, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0086, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0120, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0080, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0179, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0085, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0018, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0096, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0116, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0080, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0117, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0089, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0086, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0107, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0102, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0102, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0109, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0131, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0100, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0085, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0082, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0131, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0086, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0124, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0116, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0091, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0016, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0086, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0091, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0099, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0088, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0087, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0104, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0080, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0081, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0108, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0097, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0083, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0084, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0109, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0143, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0081, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0099, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0124, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0125, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0120, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0098, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0109, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0114, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0091, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0104, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0122, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0102, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0113, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0107, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0100, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0024, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0133, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0107, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0082, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0017, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0117, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0014, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0111, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0131, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0169, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0094, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0124, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0129, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0086, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0081, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0087, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0109, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0116, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0172, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0105, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0119, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0140, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0101, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0081, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0102, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0018, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0094, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0132, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0132, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0119, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0147, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0085, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0102, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0112, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0099, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0100, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0095, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0111, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0176, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0083, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0121, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0081, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0092, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0091, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0022, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0167, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0101, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0206, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0124, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0106, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0102, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0119, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0087, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0106, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0133, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0081, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0146, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0112, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0133, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0153, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0091, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0090, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0113, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0113, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0081, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0081, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0106, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0090, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0094, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0125, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0080, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0127, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0161, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0018, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0104, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0165, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0087, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0119, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0119, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0127, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0106, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0114, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0088, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0104, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0080, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0080, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0081, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0082, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0099, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0109, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0142, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0080, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0101, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0124, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0084, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0086, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0080, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0089, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0086, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0086, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0096, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0127, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0155, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0082, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0084, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0098, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0089, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0084, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0136, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0129, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0082, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0084, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0080, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0104, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0080, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0085, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0093, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0126, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0087, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0102, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0015, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0093, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0123, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0131, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0139, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0023, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0090, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0107, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0127, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0096, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0153, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0081, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0138, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0121, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0101, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0106, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0107, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0014, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0113, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0087, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0100, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0088, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0125, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0087, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0098, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0113, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0111, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0087, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0082, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0081, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0086, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0113, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0092, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0093, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0124, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0083, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0099, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0020, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0104, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0101, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0119, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0095, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0102, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0145, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0107, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0107, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0091, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0107, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0125, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0096, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0118, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0019, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0082, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0114, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0093, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0092, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0151, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0101, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0102, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0105, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0081, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0098, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0088, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0126, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0095, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0018, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0091, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0115, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0155, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0103, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0114, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0122, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0091, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0104, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0087, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0021, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0093, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0115, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0126, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0089, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0088, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0108, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0084, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0023, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0097, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0119, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0084, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0117, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0082, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0131, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0096, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0085, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0097, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0129, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0081, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0139, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0113, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0094, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0104, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0102, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0139, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0097, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0134, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0085, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0086, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0019, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0144, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0082, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0103, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0119, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0083, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0126, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0101, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0149, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0083, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0101, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0017, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0097, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0084, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0094, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0080, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0090, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0108, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0106, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0129, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0104, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0091, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0101, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0108, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0083, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0089, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0093, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0149, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0116, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0111, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0016, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0080, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0117, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0104, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0101, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0095, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0110, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0087, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0099, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0084, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0113, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0090, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0098, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0118, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0148, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0090, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0091, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0164, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0162, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0146, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0122, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0097, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0106, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0087, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0086, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0133, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0129, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0164, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0166, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0123, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0107, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0095, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0095, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0082, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0106, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0119, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0114, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0117, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0090, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0117, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0091, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0121, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0110, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0087, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0090, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0105, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0083, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0085, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0160, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0128, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0096, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0114, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0024, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0090, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0088, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0142, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0094, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0092, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0081, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0128, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0149, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0105, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0118, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0081, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0098, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0082, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0096, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0112, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0081, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0122, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0099, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0112, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0114, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0083, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0108, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0085, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0085, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0098, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0178, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0121, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0081, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0114, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0087, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0018, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0023, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0151, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0123, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0018, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0017, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0129, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0106, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0093, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0014, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0098, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0131, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0020, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0114, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0080, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0102, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0083, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0097, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0089, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0097, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0128, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0094, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0098, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0093, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0115, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0093, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0011, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0141, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0086, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0115, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0093, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0093, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0086, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0128, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0115, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0101, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0081, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0081, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0081, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0081, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0106, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0129, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0082, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0080, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0082, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0081, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0017, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0113, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0092, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0104, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0106, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0088, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0016, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0114, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0080, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0115, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0119, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0125, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0080, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0090, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0083, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0088, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0087, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0086, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0134, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0019, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0020, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0020, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0024, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0016, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0109, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0117, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0022, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0089, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0080, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0117, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0013, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0080, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0093, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0113, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0091, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0015, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0094, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0019, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0021, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0017, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0086, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0092, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0081, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0085, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0022, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0083, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0108, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0107, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0087, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0089, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0092, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0114, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0091, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0021, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0087, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0082, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0007, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0019, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0121, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0018, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0081, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0018, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0126, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0024, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0081, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0115, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0022, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0097, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0100, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0098, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0102, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0093, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0092, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0097, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0021, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0106, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0087, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0103, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0114, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0023, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0018, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0021, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0083, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0085, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0015, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0081, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0127, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0080, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0086, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0097, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0101, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0022, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0081, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0090, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0016, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0023, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0020, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0009, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0018, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0080, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0126, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0137, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0093, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0120, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0085, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0083, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0020, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0017, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0081, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0023, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0087, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0014, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0087, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0106, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0022, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0019, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0013, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0020, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0020, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0107, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0083, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0090, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0104, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0020, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0080, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0142, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0023, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0121, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0024, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0024, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0090, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0113, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0091, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0112, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0092, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0088, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0089, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0090, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0091, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0019, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0103, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0135, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0018, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0101, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0084, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0105, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0082, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0094, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0023, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0080, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0083, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0155, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0106, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0106, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0020, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0024, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0084, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0022, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0093, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0085, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0082, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0015, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0088, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0094, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0085, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0021, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0012, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0139, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0012, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0019, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0120, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0023, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0103, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0142, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0106, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0021, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0090, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0023, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0090, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0024, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0020, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0084, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0082, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0089, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0085, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0111, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0009, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0101, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0097, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0115, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0084, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0102, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0093, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0023, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0020, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0133, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0140, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0081, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0089, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0084, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0021, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0108, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0088, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0019, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0127, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0089, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0082, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0108, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0086, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0096, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0022, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0152, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0016, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0120, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0024, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0086, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0099, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0110, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0091, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0082, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0096, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0113, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0022, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0117, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0121, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0080, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0089, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0101, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0111, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0097, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0088, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0109, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0084, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0018, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0091, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0024, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0084, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0089, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0084, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0126, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0016, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0081, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0117, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0137, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0107, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0020, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0087, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0085, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0017, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0093, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0086, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0124, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0105, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0100, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0088, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0128, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0094, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0117, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0106, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0115, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0010, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0107, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0107, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0091, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0085, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0086, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0012, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0110, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0016, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0009, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0020, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0170, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0021, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0105, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0086, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0088, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0086, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0080, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0089, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0086, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0133, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0132, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0087, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0018, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0024, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0083, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0118, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0024, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0024, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0089, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0088, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0081, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0024, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0096, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0092, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0095, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0106, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0106, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0081, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0113, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0082, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0119, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0022, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0117, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0023, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0085, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0012, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0101, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0085, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0102, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0080, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0109, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0088, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0022, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0099, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0100, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0019, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0131, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0021, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0087, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0092, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0097, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0024, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0118, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0083, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0093, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0084, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0081, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0086, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0087, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0017, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0110, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0083, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0096, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0110, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0136, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0085, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0087, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0100, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0099, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0082, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0101, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0080, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0080, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0017, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0102, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0082, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0147, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0105, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0104, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0129, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0096, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0080, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0124, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0022, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0086, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0116, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0111, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0080, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0098, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0114, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0096, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0090, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0022, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0082, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0147, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0147, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0094, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0024, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0097, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0106, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0100, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0021, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0086, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0104, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0087, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0023, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0022, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0108, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0086, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0106, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0096, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0105, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0096, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0080, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0084, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0101, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0117, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0097, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0095, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0113, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0085, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
--------------Restarting: Beginning EPOCH 6-------------
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Test Accuracy: 0.958438136303749
Loss: tensor(0.0130, device='cuda:0', grad_fn=<NllLossBackward>)
Saving model at iteration: 0
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0125, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0113, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0202, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0086, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0152, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0212, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0114, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0088, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0133, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0085, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0133, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0105, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0144, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0128, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0135, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0129, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0182, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0084, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0176, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0090, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0109, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0193, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0152, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0120, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0115, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0127, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0151, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0176, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0124, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0221, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0099, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0174, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0143, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0188, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0123, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0179, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0235, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0129, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0196, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0087, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0190, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0023, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0016, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0135, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0088, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0014, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0214, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0135, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0101, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0123, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0018, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0127, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0129, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0168, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0101, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0004, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0121, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0004, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0004, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0112, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0230, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0122, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0106, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0024, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0167, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0128, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0004, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0152, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0101, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0140, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0004, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0142, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0011, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0091, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0158, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0121, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0126, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0016, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0004, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0004, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0011, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0124, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0104, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0106, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0010, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0010, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0097, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0116, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0004, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0009, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0004, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0010, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0093, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0088, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0003, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0129, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0099, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0150, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0137, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0135, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0009, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0093, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0103, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0018, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0003, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0003, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0099, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0154, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0186, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0018, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0008, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0003, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0003, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0003, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0003, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0082, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0157, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0149, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0127, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0092, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0021, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0003, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0118, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0008, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0087, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0003, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0198, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0003, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0003, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0007, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0104, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0007, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0130, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0170, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0184, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0099, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0113, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0209, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0205, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0167, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0098, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0135, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0104, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0020, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0013, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0003, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0243, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0115, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0115, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0111, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0003, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0157, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0202, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0141, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0089, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0092, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0087, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0021, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0090, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0117, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0102, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0109, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0147, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0167, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0020, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0094, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0147, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0162, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0102, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0113, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0102, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0122, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0145, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0094, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0134, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0081, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0081, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0124, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0120, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0152, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0106, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0105, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0142, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0224, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0181, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0106, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0093, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0117, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0192, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0092, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0084, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0015, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0172, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0127, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0140, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0155, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0106, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0131, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0213, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0105, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0013, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0145, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0115, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0117, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0084, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0132, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0084, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0194, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0018, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0154, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0118, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0105, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0100, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0166, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0106, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0142, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0121, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0102, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0104, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0095, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0185, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0020, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0120, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0020, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0118, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0181, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0092, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0156, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0222, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0092, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0103, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0181, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0176, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0162, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0090, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0088, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0105, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0096, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0156, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0015, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0098, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0176, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0117, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0128, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0139, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0172, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0164, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0099, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0092, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0126, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0128, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0186, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0082, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0091, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0194, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0091, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0099, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0151, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0017, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0110, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0107, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0099, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0090, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0147, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0166, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0099, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0096, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0083, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0167, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0162, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0145, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0091, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0220, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0187, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0136, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0111, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0087, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0087, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0120, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0201, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0102, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0131, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0176, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0097, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0083, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0119, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0104, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0125, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0166, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0149, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0086, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0127, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0230, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0090, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0119, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0212, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0204, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0086, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0086, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0105, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0092, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0100, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0024, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0099, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0112, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0114, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0023, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0007, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0086, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0094, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0084, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0081, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0081, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0136, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0022, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0087, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0085, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0130, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0131, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0142, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0081, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0081, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0150, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0127, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0088, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0081, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0016, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0146, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0096, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0091, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0127, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0132, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0124, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0102, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0131, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0091, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0080, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0092, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0119, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0119, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0097, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0018, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0097, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0164, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0212, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0136, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0090, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0121, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0122, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0004, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0100, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0013, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0099, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0127, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0093, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0136, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0153, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0175, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0012, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0015, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0123, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0141, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0088, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0144, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0022, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0016, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0154, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0180, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0098, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0126, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0173, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0105, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0133, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0088, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0022, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0098, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0103, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0093, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0005, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0094, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0160, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0138, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0140, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0108, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0015, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0088, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0024, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0143, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0098, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0124, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0005, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0092, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0105, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0108, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0098, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0083, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0117, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0012, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0099, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0177, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0127, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0115, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0012, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0134, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0107, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0124, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0185, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0112, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0143, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0131, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0110, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0103, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0097, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0023, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0080, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0123, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0153, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0167, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0016, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0134, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0115, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0104, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0021, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0108, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0225, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0007, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0110, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0081, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0084, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0086, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0014, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0023, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0110, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0129, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0112, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0124, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0084, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0135, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0113, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0013, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0159, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0138, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0140, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0125, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0098, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0122, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0157, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0098, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0152, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0099, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0115, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0151, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0109, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0012, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0196, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0023, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0212, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0147, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0085, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0090, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0102, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0112, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0112, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0081, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0226, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0134, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0110, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0131, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0094, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0167, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0127, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0099, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0145, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0147, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0089, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0199, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0091, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0120, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0143, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0081, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0205, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0107, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0106, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0088, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0099, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0080, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0080, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0176, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0223, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0143, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0106, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0024, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0166, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0009, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0084, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0138, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0023, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0138, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0081, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0107, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0083, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0091, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0171, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0018, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0191, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0206, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0092, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0109, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0080, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0019, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0124, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0020, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0084, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0013, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0093, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0008, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0134, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0024, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0116, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0113, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0113, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0083, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0015, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0135, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0271, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0093, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0087, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0159, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0085, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0102, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0016, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0022, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0157, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0142, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0100, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0109, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0006, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0114, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0142, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0024, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0152, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0115, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0004, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0107, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0112, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0092, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0202, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0108, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0096, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0139, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0099, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0136, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0022, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0009, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0136, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0140, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0090, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0128, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0120, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0111, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0097, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0089, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0098, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0138, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0181, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0118, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0104, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0115, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0082, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0018, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0135, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0101, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0108, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0114, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0130, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0153, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0136, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0131, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0020, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0175, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0157, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0083, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0131, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0131, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0083, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0097, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0089, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0092, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0086, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0105, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0196, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0131, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0082, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0097, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0004, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0181, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0100, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0118, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0106, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0154, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0141, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0143, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0141, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0014, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0127, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0182, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0175, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0141, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0110, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0114, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0081, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0174, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0083, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0126, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0084, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0087, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0097, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0175, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0158, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0151, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0126, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0124, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0132, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0126, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0145, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0080, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0089, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0170, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0105, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0081, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0105, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0215, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0121, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0113, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0082, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0097, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0107, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0135, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0150, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0098, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0142, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0095, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0096, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0100, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0007, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0127, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0113, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0181, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0128, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0143, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0095, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0085, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0113, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0091, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0095, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0111, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0177, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0125, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0154, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0186, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0082, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0177, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0114, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0096, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0151, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0123, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0085, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0185, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0105, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0101, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0139, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0148, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0084, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0092, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0080, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0145, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0020, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0122, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0121, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0020, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0161, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0021, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0156, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0092, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0159, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0084, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0135, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0126, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0083, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0143, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0086, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0210, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0157, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0146, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0083, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0010, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0095, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0087, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0131, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0137, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0142, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0023, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0133, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0082, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0159, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0018, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0216, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0008, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0145, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0081, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0087, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0212, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0134, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0009, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0013, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0086, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0082, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0103, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0092, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0122, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0085, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0019, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0105, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0126, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0161, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0100, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0181, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0106, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0159, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0107, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0008, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0008, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0157, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0102, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0109, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0146, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0099, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0018, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0162, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0137, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0142, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0019, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0012, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0100, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0092, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0022, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0115, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0156, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0114, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0097, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0153, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0161, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0112, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0111, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0191, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0126, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0081, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0022, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0136, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0094, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0009, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0174, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0103, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0379, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0122, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0106, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0126, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0013, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0108, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0094, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0002, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0121, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0110, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0089, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0117, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0136, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0144, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0099, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0135, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0097, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0003, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0152, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0087, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0107, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0112, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0007, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0096, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0087, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0173, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0109, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0141, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0019, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0098, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0010, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0133, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0114, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0137, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0095, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0203, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0156, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0167, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0179, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0106, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0147, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0014, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0144, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0080, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0154, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0081, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0021, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0113, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0164, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0090, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0137, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0107, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0139, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0118, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0095, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0151, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0237, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0092, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0118, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0107, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0115, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0103, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0139, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0185, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0127, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0137, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0172, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0109, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0155, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0128, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0143, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0108, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0158, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0111, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0018, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0010, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0101, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0099, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0122, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0085, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0175, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0088, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0011, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0018, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0106, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0087, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0196, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0011, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0085, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0115, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0129, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0159, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0122, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0123, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0009, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0010, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0085, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0098, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0128, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0085, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0090, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0123, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0132, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0159, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0023, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0114, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0215, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0086, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0092, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0151, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0110, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0084, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0104, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0095, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0124, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0153, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0160, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0139, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0168, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0097, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0016, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0086, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0171, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0088, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0006, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0138, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0122, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0147, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0010, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0013, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0119, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0135, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0108, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0091, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0175, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0088, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0181, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0082, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0024, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0157, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0002, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0097, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0136, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0231, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0128, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0147, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0022, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0105, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0094, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0113, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0086, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0018, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0087, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0164, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0183, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0102, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0126, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0117, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0008, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0099, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0123, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0127, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0156, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0088, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0086, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0130, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0006, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0002, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0117, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0158, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0120, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0158, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0106, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0094, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0138, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0082, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0016, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0094, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0117, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0103, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0024, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0088, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0013, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0085, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0171, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0121, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0107, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0101, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0024, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0127, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0125, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0004, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0140, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0175, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0121, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0022, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0096, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0152, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0094, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0087, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0154, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0179, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0083, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0220, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0084, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0149, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0098, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0093, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0088, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0084, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0106, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0010, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0123, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0008, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0094, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0094, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0091, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0080, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0105, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0131, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0015, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0021, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0114, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0123, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0108, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0113, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0108, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0135, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0011, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0166, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0103, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0120, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0101, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0150, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0100, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0135, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0153, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0147, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0085, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0112, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0117, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0017, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0081, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0146, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0108, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0088, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0108, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0082, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0247, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0104, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0019, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0186, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0205, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0088, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0109, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0128, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0084, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0016, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0110, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0095, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0112, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0084, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0012, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0080, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0001, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0088, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0169, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0159, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0159, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0125, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0139, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0088, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0106, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0022, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0117, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0084, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0002, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0112, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0001, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0007, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0102, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0156, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0099, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0093, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0094, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0135, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0129, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0084, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0089, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0175, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0007, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0117, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0101, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0118, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0009, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0123, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0097, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0140, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0108, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0086, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0023, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0138, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0118, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0006, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0135, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0084, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0109, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0082, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0092, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0122, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0103, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0110, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0134, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0092, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0184, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0013, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0099, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0197, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0086, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0144, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0090, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0151, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0100, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0126, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0105, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0104, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0190, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0147, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0016, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0112, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0122, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0121, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0111, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0086, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0120, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0149, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0113, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0104, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0100, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0084, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0131, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0093, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0170, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0147, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0132, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0122, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0099, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0112, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0156, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0200, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0090, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0007, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0151, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0095, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0019, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0155, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0176, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0097, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0249, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0090, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0004, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0171, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0099, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0100, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0087, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0109, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0018, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0022, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0092, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0116, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0084, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0135, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0104, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0200, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0017, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0015, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0122, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0087, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0103, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0152, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0141, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0240, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0119, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0181, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0262, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0144, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0089, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0138, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0133, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0106, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0156, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0098, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0081, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0095, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0084, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0120, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0101, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0086, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0009, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0144, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0014, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0114, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0182, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0174, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0093, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0116, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0095, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0112, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0022, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0092, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0085, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0084, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0166, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0113, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0150, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0114, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0081, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0127, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0097, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0022, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0133, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0183, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0084, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0090, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0139, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0018, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0024, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0129, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0099, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0024, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0131, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0016, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0022, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0023, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0179, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0130, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0110, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0081, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0124, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0102, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0013, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0089, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0108, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0093, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0086, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0084, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0083, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0100, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0124, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0096, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0014, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0083, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0003, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0093, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0082, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0082, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0091, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0097, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0081, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0096, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0087, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0023, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0109, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0110, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0127, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0102, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0010, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0080, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0128, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0087, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0100, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0147, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0103, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0131, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0116, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0083, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0124, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0142, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0137, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0112, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0108, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0015, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0088, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0100, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0022, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0129, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0154, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0019, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0118, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0102, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0165, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0080, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0183, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0087, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0084, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0126, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0109, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0138, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0083, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0090, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0081, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0105, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0083, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0094, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0118, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0172, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0088, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0100, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0085, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0169, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0017, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0017, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0019, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0127, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0023, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0020, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0022, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0023, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0094, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0115, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0110, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0095, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0149, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0021, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0163, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0109, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0022, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0102, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0010, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0117, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0023, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0015, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0017, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0207, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0114, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0092, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0161, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0110, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0083, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0119, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0084, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0148, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0013, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0154, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0003, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0087, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0082, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0183, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0248, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0023, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0023, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0081, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0092, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0125, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0095, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0112, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0006, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0024, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0158, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0014, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0144, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0084, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0088, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0197, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0101, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0126, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0102, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0207, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0183, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0120, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0112, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0080, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0218, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0156, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0180, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0105, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0131, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0094, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0119, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0225, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0086, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0023, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0160, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0105, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0082, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0134, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0094, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0116, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0096, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0097, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0183, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0089, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0121, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0109, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0082, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0134, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0156, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0124, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0128, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0091, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0116, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0022, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0093, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0012, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0167, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0121, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0088, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0093, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0098, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0096, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0092, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0083, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0081, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0129, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0087, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0008, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0097, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0132, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0089, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0139, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0092, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0011, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0118, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0104, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0095, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0114, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0080, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0095, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0093, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0120, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0102, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0097, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0099, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0093, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0103, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0016, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0092, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0146, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0018, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0003, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0193, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0007, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0154, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0094, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0187, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0004, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0104, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0134, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0119, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0008, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0138, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0155, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0090, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0166, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0019, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0112, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0014, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0119, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0084, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0104, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0017, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0086, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0023, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0101, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0191, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0197, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0115, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0091, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0082, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0085, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0098, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0010, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0174, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0144, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0119, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0114, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0138, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0084, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0021, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0138, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0144, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0088, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0104, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0083, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0128, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0096, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0115, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0081, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0080, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0159, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0120, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0097, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0084, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0123, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0115, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0129, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0097, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0129, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0126, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0082, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0115, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0139, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0020, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0016, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0133, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0098, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0082, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0188, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0091, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0008, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0091, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0106, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0187, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0086, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0110, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0139, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0083, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0084, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0021, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0006, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0148, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0186, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0198, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0009, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0130, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0120, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0113, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0133, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0088, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0085, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0155, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0084, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0159, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0125, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0021, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0088, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0092, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0160, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0142, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0146, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0019, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0021, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0080, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0094, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0102, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0110, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0158, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0101, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0111, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0145, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0182, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0160, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0083, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0020, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0122, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0148, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0118, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0110, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0116, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0085, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0304, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0089, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0014, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0099, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0088, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0085, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0093, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0139, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0024, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0131, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0018, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0141, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0081, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0007, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0086, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0119, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0119, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0172, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0090, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0098, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0208, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0145, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0086, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0126, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0153, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0106, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0089, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0111, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0002, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0096, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0024, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0107, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0080, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0080, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0023, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0091, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0100, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0084, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0023, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0090, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0098, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0084, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0119, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0095, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0009, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0023, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0144, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0003, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0087, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0146, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0136, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0107, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0092, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0176, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0105, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0107, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0135, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0092, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0088, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0084, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0007, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0007, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0114, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0232, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0132, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0094, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0154, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0092, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0122, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0141, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0019, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0097, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0107, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0113, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0107, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0143, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0112, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0107, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0119, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0217, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0005, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0147, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0102, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0100, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0012, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0146, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0164, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0107, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0089, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0012, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0099, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0103, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0111, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0094, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0085, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0091, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0201, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0148, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0100, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0016, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0119, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0111, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0086, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0024, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0090, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0085, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0165, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0015, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0118, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0133, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0021, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0139, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0090, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0080, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0135, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0130, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0090, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0103, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0133, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0136, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0097, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0008, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0090, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0011, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0113, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0114, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0090, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0172, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0118, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0192, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0232, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0021, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0094, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0131, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0006, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0087, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0177, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0254, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0085, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0080, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0178, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0093, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0095, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0111, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0134, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0123, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0114, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0095, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0092, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0140, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0101, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0128, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0086, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0104, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0018, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0119, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0006, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0156, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0165, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0108, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0180, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0081, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0089, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0005, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0095, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0100, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0237, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0020, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0080, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0124, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0097, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0098, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0080, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0014, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0084, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0004, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0103, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0185, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0156, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0091, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0093, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0024, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0107, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0199, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0167, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0101, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0087, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0174, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0124, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0113, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0083, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0097, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0109, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0095, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0092, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0146, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0081, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0083, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0145, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0099, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0218, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0081, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0217, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0197, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0083, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0087, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0016, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0112, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0160, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0096, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0117, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0127, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0088, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0104, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0119, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0103, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0082, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0095, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0089, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0139, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0010, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0013, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0114, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0097, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0020, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0092, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0083, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0089, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0185, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0172, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0155, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0178, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0092, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0104, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0014, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0143, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0014, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0137, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0113, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0086, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0178, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0088, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0137, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0090, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0084, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0129, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0121, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0159, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0115, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0018, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0099, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0119, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0083, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0089, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0085, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0132, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0104, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0128, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0106, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0162, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0087, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0097, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0095, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0093, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0107, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0103, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0125, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0100, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0107, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0083, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0144, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0092, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0155, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0011, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0092, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0161, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0002, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0095, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0087, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0107, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0116, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0090, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0100, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0084, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0108, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0016, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0131, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0130, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0223, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0132, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0148, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0090, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0098, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0142, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0087, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0173, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0139, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0092, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0096, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0135, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0132, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0236, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0022, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0188, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0016, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0162, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0089, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0096, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0105, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0101, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0102, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0093, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0083, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0134, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0086, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0097, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0125, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0199, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0005, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0107, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0020, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0142, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0099, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0114, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0088, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0221, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0099, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0105, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0118, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0157, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0089, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0119, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0104, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0174, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0088, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0085, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0180, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0093, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0164, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0139, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0154, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0149, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0177, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0153, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0129, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0270, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0120, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0166, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0143, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0114, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0103, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0114, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0278, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0086, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0111, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0142, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0086, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0084, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0104, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0125, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0132, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0147, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0148, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0190, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0020, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0003, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0146, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0112, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0080, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0016, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0083, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0157, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0089, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0211, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0128, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0100, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0109, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0099, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0157, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0119, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0146, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0130, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0094, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0170, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0017, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0126, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0194, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0116, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0112, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0191, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0204, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0134, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0090, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0181, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0093, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0155, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0092, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0118, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0167, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0116, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0087, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0099, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0127, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0092, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0104, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0080, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0109, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0107, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0145, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0122, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0116, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0010, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0084, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0091, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0014, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0087, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0113, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0020, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0017, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0084, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0131, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0095, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0174, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0116, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0103, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0100, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0096, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0084, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0097, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0111, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0115, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0089, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0090, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0123, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0108, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0138, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0138, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0137, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0129, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0090, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0117, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0109, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0080, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0101, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0139, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0103, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0103, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0083, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0010, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0103, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0015, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0155, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0115, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0118, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0124, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0112, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0018, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0147, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0014, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0012, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0094, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0089, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0007, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0023, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0015, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0080, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0174, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0107, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0012, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0162, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0097, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0165, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0011, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0012, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0100, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0023, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0130, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0203, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0021, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0141, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0177, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0083, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0020, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0150, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0013, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0157, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0085, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0138, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0094, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0095, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0152, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0119, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0080, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0174, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0004, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0080, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0101, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0140, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0108, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0095, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0184, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0129, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0016, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0155, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0117, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0091, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0092, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0115, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0108, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0113, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0142, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0101, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0167, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0098, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0099, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0184, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0020, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0083, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0116, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0108, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0015, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0082, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0086, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0086, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0021, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0125, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0091, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0023, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0106, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0081, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0022, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0088, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0091, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0144, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0179, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0111, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0141, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0099, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0080, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0011, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0101, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0124, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0011, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0109, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0132, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0105, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0118, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0122, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0130, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0008, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0082, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0137, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0166, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0133, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0135, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0108, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0114, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0133, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0092, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0003, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0012, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0160, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0155, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0081, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0250, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0162, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0086, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0080, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0005, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0020, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0222, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0093, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0227, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0120, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0083, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0086, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0156, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0024, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0115, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0142, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0228, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0101, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0086, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0082, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0096, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0118, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0102, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0096, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0081, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0108, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0017, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0098, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0090, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0127, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0098, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0103, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0086, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0083, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0134, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0014, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0177, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0116, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0137, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0113, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0017, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0008, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0111, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0115, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0091, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0108, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0090, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0024, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0096, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0155, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0018, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0082, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0087, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0020, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0120, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0107, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0139, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0106, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0002, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0100, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0110, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0089, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0105, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0087, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0089, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0094, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0099, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0153, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0109, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0082, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0161, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0088, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0102, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0089, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0088, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0167, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0107, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0024, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0099, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0017, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0168, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0152, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0140, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0103, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0139, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0109, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0139, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0116, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0091, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0112, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0143, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0082, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0102, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0103, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0106, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0093, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0117, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0163, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0091, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0088, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0007, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0212, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0145, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0118, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0115, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0023, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0022, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0010, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0098, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0118, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0217, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0108, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0080, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0085, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0159, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0134, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0012, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0101, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0145, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0114, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0134, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0110, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0082, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0098, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0145, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0129, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0116, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0084, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0106, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0145, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0116, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0109, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0085, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0119, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0137, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0112, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0135, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0130, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0100, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0246, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0201, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0098, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0216, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0080, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0091, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0002, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0122, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0124, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0003, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0104, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0132, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0016, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0107, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0117, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0106, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0105, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0167, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0088, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0178, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0124, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0087, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0138, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0142, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0009, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0092, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0137, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0132, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0177, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0148, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0130, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0164, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0080, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0014, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0179, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0098, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0117, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0112, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0087, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0101, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0126, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0020, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0176, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0021, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0095, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0100, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0092, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0092, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0002, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0099, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0024, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0005, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0138, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0124, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0020, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0023, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0150, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0141, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0018, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0144, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0020, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0191, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0105, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0023, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0168, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0107, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0148, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0009, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0083, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0130, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0094, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0174, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0180, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0082, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0082, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0093, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0150, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0087, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0109, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0150, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0083, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0016, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0122, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0123, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0141, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0081, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0084, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0093, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0013, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0101, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0121, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0088, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0083, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0095, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0214, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0094, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0083, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0114, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0117, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0024, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0094, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0019, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0111, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0105, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0174, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0085, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0103, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0114, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0110, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0109, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0139, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0007, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0012, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0152, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0097, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0137, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0138, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0127, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0023, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0201, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0112, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0092, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0092, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0221, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0081, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0135, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0083, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0171, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0080, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0090, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0102, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0090, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0214, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0093, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0099, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0113, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0011, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0016, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0128, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0004, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0217, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0093, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0006, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0017, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0150, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0118, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0087, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0018, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0124, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0110, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0184, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0128, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0004, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0086, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0193, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0110, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0146, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0161, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0142, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0166, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0139, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0218, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0016, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0019, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0005, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0167, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0138, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0131, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0081, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0084, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0119, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0090, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0123, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0156, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0148, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0101, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0132, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0123, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0093, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0018, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0127, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0123, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0135, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0107, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0084, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0115, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0085, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0022, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0181, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0126, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0095, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0115, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0119, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0101, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0088, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0133, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0099, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0105, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0107, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0022, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0009, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0091, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0002, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0113, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0086, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0022, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0135, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0119, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0218, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0106, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0155, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0006, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0083, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0194, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0013, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0100, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0002, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0139, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0092, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0105, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0096, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0140, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0110, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0020, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0112, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0216, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0122, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0020, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0169, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0017, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0114, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0093, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0125, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0082, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0230, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0099, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0091, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0145, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0087, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0124, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0113, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0118, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0184, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0085, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0086, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0104, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0140, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0090, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0083, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0127, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0020, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0021, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0137, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0107, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0121, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0082, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0083, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0123, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0081, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0126, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0199, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0127, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0120, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0116, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0105, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0092, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0090, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0099, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0024, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0024, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0177, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0142, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0097, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0151, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0083, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0164, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0096, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0106, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0137, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0002, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0145, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0081, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0102, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0001, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0162, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0109, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0092, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0104, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0194, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0099, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0095, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0118, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0122, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0015, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0087, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0111, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0089, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0224, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0123, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0096, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0104, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0020, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0110, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0001, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0005, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0102, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0085, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0174, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0137, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0164, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0171, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0081, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0108, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0117, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0092, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0017, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0010, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0085, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0146, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0126, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0001, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0021, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0199, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0245, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0023, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0007, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0085, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0107, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0097, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0008, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0133, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0139, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0012, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0095, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0096, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0001, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0088, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0097, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0108, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0114, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0016, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0001, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0084, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0118, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0104, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0084, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0083, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0001, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0099, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0121, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0090, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0122, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0098, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0089, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0081, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0011, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0104, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0192, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0023, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0091, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0086, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0098, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0125, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0093, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0097, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0083, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0110, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0187, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0022, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0089, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0101, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0112, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0187, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0019, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0102, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0101, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0125, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0087, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0088, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0116, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0101, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0006, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0083, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0091, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0084, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0170, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0098, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0109, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0127, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0109, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0091, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0087, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0125, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0089, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0090, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0152, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0159, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0183, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0108, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0146, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0095, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0019, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0096, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0098, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0104, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0134, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0177, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0120, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0099, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0097, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0106, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0111, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0086, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0101, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0115, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0109, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0006, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0148, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0088, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0176, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0131, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0091, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0004, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0017, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0091, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0098, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0080, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0159, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0104, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0092, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0119, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0138, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0128, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0091, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0142, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0174, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0112, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0085, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0162, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0108, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0106, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0108, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0130, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0022, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0113, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0096, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0090, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0141, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0105, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0120, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0094, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0084, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0006, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0121, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0123, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0111, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0115, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0100, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0160, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0162, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0110, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0081, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0016, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0083, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0128, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0188, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0111, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0184, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0012, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0083, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0155, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0090, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0106, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0132, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0085, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0129, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0131, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0087, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0085, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0111, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0093, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0131, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0099, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0018, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0023, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0103, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0174, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0105, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0002, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0127, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0091, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0121, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0090, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0115, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0112, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0167, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0127, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0095, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0087, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0100, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0080, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0022, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0182, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0100, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0136, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0124, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0019, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0119, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0113, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0082, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0102, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0135, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0134, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0165, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0138, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0112, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0024, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0092, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0100, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0124, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0094, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0080, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0103, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0086, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0099, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0116, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0093, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0152, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0094, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0019, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0021, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0109, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0023, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0120, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0096, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0094, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0135, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0105, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0135, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0010, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0181, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0106, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0087, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0130, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0088, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0094, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0132, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0147, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0097, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0092, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0086, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0107, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0124, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0173, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0209, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0103, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0132, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0088, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0097, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0097, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0156, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0146, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0092, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0146, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0109, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0085, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0007, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0088, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0126, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0102, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0101, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0097, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0113, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0101, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0106, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0139, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0141, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0106, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0021, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0110, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0008, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0111, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0081, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0134, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0228, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0122, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0142, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0086, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0113, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0121, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0114, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0130, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0198, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0082, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0090, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0097, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0114, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0167, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0134, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0138, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0121, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0185, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0146, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0020, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0102, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0108, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0023, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0152, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0081, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0084, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0141, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0012, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0106, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0095, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0126, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0116, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0152, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0154, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0094, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0175, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0016, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0081, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0090, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0126, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0106, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0133, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0094, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0112, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0146, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0080, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0094, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0081, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0102, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0098, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0134, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0175, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0114, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0125, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0102, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0088, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0088, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0107, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0118, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0101, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0080, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0109, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0019, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0102, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0088, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0094, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0109, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0177, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0087, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0128, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0080, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0024, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0094, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0101, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0083, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0108, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0092, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0081, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0015, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0109, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0086, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0109, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0097, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0143, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0084, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0080, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0022, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0119, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0093, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0100, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0116, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0019, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0111, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0112, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0023, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0142, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0102, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0094, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0110, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0131, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0086, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0116, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0094, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0090, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0130, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0124, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0087, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0081, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0082, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0092, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0086, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0086, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0126, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0089, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0149, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0108, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0137, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0095, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0129, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0115, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0088, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0143, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0082, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0081, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0083, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0117, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0010, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0085, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0084, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0166, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0098, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0091, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0097, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0148, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0023, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0105, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0181, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0083, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0131, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0108, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0125, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0095, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0086, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0115, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0082, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0094, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0022, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0106, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0095, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0093, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0092, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0142, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0120, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0129, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0112, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0095, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0128, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0105, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0109, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0105, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0152, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0104, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0099, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0083, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0154, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0115, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0235, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0116, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0097, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0098, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0082, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0085, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0120, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0088, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0084, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0096, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0084, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0110, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0017, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0176, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0161, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0082, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0103, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0092, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0006, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0021, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0116, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0149, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0120, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0141, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0093, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0142, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0134, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0144, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0021, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0080, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0084, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0022, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0015, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0112, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0100, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0103, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0083, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0109, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0133, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0106, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0121, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0124, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0024, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0086, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0163, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0167, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0129, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0171, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0129, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0089, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0104, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0082, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0083, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0019, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0103, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0086, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0234, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0095, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0119, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0155, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0095, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0173, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0108, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0175, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0119, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0103, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0118, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0109, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0085, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0105, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0090, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0100, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0126, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0105, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0150, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0137, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0143, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0095, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0107, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0081, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0099, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0114, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0119, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0091, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0105, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0125, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0089, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0105, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0092, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0092, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0084, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0248, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0095, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0160, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0089, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0140, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0104, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0140, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0102, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0117, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0095, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0091, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0123, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0085, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0104, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0023, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0166, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0006, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0134, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0134, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0145, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0152, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0088, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0101, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0017, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0161, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0101, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0123, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0086, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0083, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0092, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0108, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0152, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0151, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0017, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0123, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0095, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0088, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0090, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0089, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0103, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0113, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0148, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0173, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0098, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0102, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0109, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0153, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0122, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0134, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0115, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0088, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0081, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0140, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0024, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0136, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0142, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0121, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0101, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0114, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0018, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0087, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0123, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0088, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0082, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0169, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0105, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0100, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0110, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0143, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0102, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0022, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0113, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0117, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0081, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0113, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0109, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0085, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0099, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0105, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0170, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0092, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0093, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0102, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0099, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0114, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0093, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0019, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0090, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0115, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0129, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0126, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0098, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0149, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0095, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0109, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0132, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0091, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0142, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0106, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0090, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0137, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0024, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0104, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0113, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0152, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0081, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0083, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0134, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0083, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0140, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0134, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0169, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0101, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0115, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0103, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0083, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0014, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0117, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0106, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0024, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0206, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0084, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0086, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0092, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0157, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0089, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0080, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0019, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0022, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0124, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0084, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0087, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0131, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0122, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0087, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0136, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0016, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0169, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0019, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0102, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0115, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0080, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0101, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0124, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0101, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0017, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0119, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0088, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0103, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0129, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0116, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0089, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0122, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0083, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0131, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0140, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0144, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0090, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0120, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0129, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0023, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0103, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0085, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0022, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0123, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0127, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0093, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0084, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0085, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0192, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0114, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0091, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0011, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0095, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0087, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0168, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0124, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0174, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0182, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0096, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0089, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0090, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0093, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0020, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0115, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0091, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0187, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0009, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0143, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0090, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0153, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0084, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0093, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0195, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0086, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0080, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0128, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0086, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0081, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0095, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0020, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0171, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0081, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0107, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0086, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0083, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0130, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0088, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0134, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0117, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0133, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0094, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0082, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0006, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0126, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0089, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0103, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0101, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0084, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0020, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0106, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0102, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0128, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0158, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0085, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0080, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0130, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0086, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0099, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0126, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0134, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0133, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0119, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0131, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0081, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0128, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0105, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0108, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0102, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0102, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0092, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0111, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0087, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0129, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0166, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0099, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0091, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0088, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0125, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0013, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0108, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0140, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0168, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0193, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0095, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0092, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0117, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0097, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0126, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0083, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0111, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0144, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0089, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0118, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0160, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0081, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0086, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0089, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0087, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0152, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0122, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0102, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0084, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0152, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0114, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0140, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0167, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0086, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0111, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0133, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0099, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0109, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0006, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0112, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0082, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0016, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0096, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0104, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0082, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0093, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0112, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0024, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0099, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0100, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0111, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0122, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0121, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0102, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0094, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0089, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0181, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0089, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0102, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0015, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0123, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0094, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0093, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0089, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0097, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0081, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0137, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0122, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0095, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0110, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0099, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0099, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0095, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0103, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0096, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0134, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0089, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0018, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0104, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0160, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0106, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0083, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0081, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0099, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0158, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0081, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0019, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0183, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0097, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0194, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0018, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0082, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0169, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0154, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0015, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0083, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0087, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0092, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0082, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0094, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0094, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0095, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0089, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0109, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0095, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0135, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0114, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0092, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0098, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0120, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0082, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0096, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0121, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0129, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0088, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0133, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0005, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0089, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0093, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0176, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0102, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0021, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0123, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0143, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0090, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0098, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0099, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0095, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0086, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0020, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0158, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0017, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0090, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0082, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0173, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0101, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0091, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0097, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0088, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0299, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0097, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0104, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0132, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0085, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0094, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0095, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0100, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0123, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0087, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0112, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0138, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0111, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0083, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0096, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0136, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0133, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0144, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0152, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0127, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0093, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0105, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0137, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0110, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0088, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0123, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0094, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0088, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0093, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0085, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0156, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0018, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0088, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0096, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0119, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0106, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0017, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0083, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0127, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0093, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0083, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0083, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0088, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0081, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0147, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0023, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0017, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0020, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0081, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0083, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0017, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0126, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0164, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0127, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0090, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0109, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0117, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0084, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0091, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0097, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0101, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0198, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0105, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0126, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0081, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0084, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0084, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0022, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0115, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0082, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0081, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0137, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0023, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0086, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0136, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0181, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0115, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0084, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0100, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0097, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0107, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0114, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0022, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0124, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0092, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0098, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0096, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0013, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0163, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0081, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0087, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0010, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0010, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0084, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0121, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0113, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0017, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0138, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0116, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0152, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0080, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0156, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0115, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0111, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0126, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0106, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0080, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0175, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0136, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0109, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0081, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0108, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0086, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0104, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0102, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0094, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0116, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0113, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0093, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0086, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0158, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0090, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0120, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0108, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0124, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0117, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0100, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0084, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0113, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0105, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0109, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0010, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0164, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0084, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0094, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0083, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0089, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0021, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0182, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0137, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0123, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0111, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0107, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0122, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0080, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0086, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0099, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0097, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0102, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0095, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0097, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0099, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0117, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0111, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0181, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0121, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0098, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0119, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0094, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0125, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0086, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0126, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0089, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0082, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0096, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0020, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0173, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0122, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0143, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0100, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0011, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0146, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0113, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0110, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0087, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0103, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0110, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0084, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0099, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0133, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0095, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0108, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0154, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0094, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0084, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0179, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0097, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0102, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0086, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0136, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0116, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0149, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0015, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0121, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0084, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0080, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0114, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0022, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0087, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0082, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0095, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0089, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0114, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0120, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0085, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0087, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0158, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0141, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0121, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0086, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0104, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0087, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0112, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0103, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0105, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0017, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0010, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0122, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0119, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0019, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0149, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0087, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0090, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0091, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0085, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0101, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0081, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0172, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0155, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0011, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0089, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0109, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0110, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0099, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0118, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0020, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0086, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0193, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0089, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0084, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0120, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0155, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0146, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0111, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0084, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0094, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0086, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0091, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0085, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0144, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0108, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0098, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0129, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0100, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0127, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0112, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0123, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0103, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0087, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0112, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0082, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0088, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0087, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0117, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0108, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0021, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0100, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0094, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0091, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0017, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0096, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0132, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0151, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0015, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0159, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0099, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0089, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0096, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0012, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0080, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0183, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0152, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0103, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0211, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0113, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0006, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0118, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0088, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0093, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0082, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0143, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0112, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0136, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0087, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0126, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0107, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0106, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0107, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0112, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0097, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0109, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0108, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0107, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0107, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0010, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0153, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0103, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0134, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0080, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0119, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0133, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0207, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0106, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0097, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0141, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0108, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0013, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0143, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0089, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0103, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0111, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0096, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0101, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0090, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0023, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0024, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0126, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0105, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0102, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0017, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0136, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0146, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0134, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0097, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0024, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0108, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0023, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0094, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0126, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0128, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0023, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0096, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0114, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0128, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0114, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0092, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0154, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0095, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0119, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0082, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0087, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0081, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0014, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0109, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0081, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0099, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0122, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0097, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0087, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0097, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0112, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0098, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0107, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0087, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0115, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0122, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0089, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0083, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0172, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0105, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0119, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0107, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0152, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0094, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0095, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0132, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0156, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0103, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0097, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0108, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0101, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0089, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0106, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0088, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0018, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0117, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0138, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0009, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0110, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0086, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0133, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0103, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0091, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0123, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0010, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0110, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0084, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0090, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0024, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0144, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0126, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0082, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0133, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0102, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0134, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0105, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0084, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0100, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0091, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0133, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0095, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0021, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0083, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0083, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0142, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0107, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0090, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0090, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0116, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0113, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0173, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0170, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0084, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0106, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0164, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0102, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0094, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0133, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0081, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0132, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0145, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0088, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0085, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0106, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0143, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0131, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0129, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0117, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0096, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0091, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0081, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0094, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0137, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0092, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0163, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0132, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0129, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0167, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0107, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0103, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0022, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0144, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0112, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0111, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0080, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0086, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0098, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0084, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0085, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0094, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0111, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0111, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0130, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0147, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0130, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0109, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0118, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0095, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0118, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0112, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0119, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0086, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0125, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0109, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0095, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0121, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0106, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0081, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0129, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0115, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0019, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0160, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0101, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0024, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0103, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0085, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0134, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0101, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0023, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0133, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0089, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0118, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0134, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0100, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0100, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0116, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0108, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0101, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0100, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0091, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0021, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0167, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0091, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0181, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0094, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0082, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0085, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0089, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0024, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0083, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0086, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0134, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0110, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0098, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0092, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0112, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0138, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0090, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0099, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0087, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0108, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0084, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0022, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0107, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0107, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0093, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0022, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0103, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0138, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0113, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0090, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0227, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0127, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0089, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0084, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0123, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0102, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0112, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0091, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0085, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0102, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0133, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0097, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0106, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0088, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0124, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0113, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0021, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0097, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0110, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0086, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0132, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0097, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0091, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0166, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0092, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0094, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0089, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0110, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0121, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0127, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0132, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0133, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0124, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0083, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0107, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0109, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0113, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0129, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0120, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0090, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0103, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0081, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0095, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0119, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0139, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0100, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0135, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0109, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0018, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0084, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0119, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0119, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0084, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0124, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0113, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0150, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0100, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0105, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0111, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0258, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0140, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0021, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0081, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0080, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0118, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0098, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0112, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0082, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0114, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0138, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0101, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0109, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0022, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0174, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0085, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0176, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0131, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0085, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0227, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0107, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0142, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0112, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0092, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0100, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0024, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0023, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0091, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0083, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0122, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0112, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0093, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0088, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0114, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0090, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0084, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0127, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0133, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0137, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0119, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0087, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0104, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0089, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0112, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0119, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0131, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0086, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0097, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0141, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0095, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0013, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0132, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0089, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0097, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0022, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0133, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0023, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0014, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0093, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0084, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0136, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0122, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0107, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0137, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0088, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0163, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0088, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0122, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0090, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0082, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0131, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0127, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0020, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0022, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0105, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0081, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0102, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0024, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0145, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0111, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0110, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0088, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0117, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0080, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0081, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0113, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0149, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0104, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0087, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0118, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0082, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0110, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0132, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0019, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0093, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0104, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0124, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0083, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0096, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0094, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0085, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0138, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0099, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0021, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0093, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0083, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0096, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0010, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0022, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0091, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0157, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0101, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0160, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0102, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0081, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0201, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0119, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0091, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0105, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0080, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0086, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0083, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0082, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0023, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0140, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0121, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0095, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0184, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0130, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0116, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0123, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0102, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0091, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0128, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0159, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0121, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0131, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0098, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0109, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0024, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0089, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0081, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0093, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0135, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0082, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0129, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0090, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0110, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0087, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0125, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0102, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0236, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0103, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0082, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0131, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0140, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0090, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0010, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0101, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0091, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0084, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0103, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0154, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0092, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0104, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0100, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0086, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0118, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0021, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0024, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0091, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0091, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0097, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0100, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0110, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0163, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0096, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0089, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0023, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0167, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0093, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0102, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0112, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0094, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0090, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0117, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0132, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0125, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0080, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0093, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0122, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0102, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0013, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0080, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0100, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0101, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0182, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0093, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0082, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0090, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0017, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0138, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0081, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0094, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0132, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0098, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0126, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0080, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0019, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0107, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0185, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0086, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0094, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0088, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0127, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0143, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0115, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0102, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0080, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0100, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0141, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0020, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0106, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0134, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0084, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0089, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0185, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0099, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0084, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0084, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0096, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0160, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0112, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0154, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0164, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0022, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0087, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0099, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0085, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0021, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0103, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0021, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0016, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0086, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0138, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0125, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0089, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0099, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0096, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0090, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0080, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0111, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0122, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0164, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0139, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0100, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0022, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0092, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0020, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0104, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0127, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0080, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0091, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0084, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0125, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0093, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0110, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0086, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0106, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0094, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0107, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0108, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0082, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0109, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0099, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0108, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0082, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0019, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0087, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0107, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0084, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0137, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0146, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0145, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0139, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0107, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0115, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0117, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0109, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0132, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0097, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0015, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0114, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0081, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0083, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0099, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0023, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0137, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0011, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0085, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0087, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0106, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0107, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0080, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0080, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0113, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0024, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0085, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0087, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0096, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0104, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0092, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0085, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0084, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0125, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0136, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0085, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0128, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0140, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0177, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0081, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0080, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0090, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0168, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0087, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0127, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0118, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0098, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0091, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0114, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0093, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0100, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0127, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0101, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0094, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0102, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0103, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0117, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0112, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0023, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0018, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0113, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0092, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0101, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0023, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0081, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0116, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0108, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0130, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0097, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0093, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0090, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0092, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0085, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0106, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0161, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0127, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0174, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0083, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0110, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0096, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0085, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0020, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0125, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0082, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0131, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0098, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0133, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0083, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0096, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0103, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0089, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0085, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0087, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0018, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0081, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0113, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0123, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0152, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0086, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0080, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0193, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0013, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0125, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0146, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0015, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0084, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0129, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0123, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0129, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0117, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0096, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0114, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0112, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0120, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0101, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0083, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0101, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0107, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0191, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0171, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0082, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0086, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0118, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0129, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0084, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0088, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0101, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0119, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0091, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0084, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0124, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0126, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0186, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0097, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0082, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0082, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0151, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0095, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0090, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0082, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0125, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0128, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0138, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0186, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0105, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0084, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0086, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0082, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0100, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0108, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0154, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0085, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0155, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0094, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0115, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0107, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0023, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0080, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0080, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0108, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0090, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0103, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0090, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0100, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0023, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0142, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0021, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0115, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0089, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0119, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0164, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0087, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0146, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0100, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0099, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0098, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0107, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0103, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0083, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0022, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0008, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0088, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0085, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0017, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0095, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0116, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0088, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0083, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0120, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0092, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0110, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0096, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0082, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0151, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0121, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0016, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0099, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0109, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0090, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0133, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0152, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0117, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0102, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0095, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0161, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0095, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0082, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0108, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0119, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0093, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0014, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0095, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0091, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0096, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0102, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0081, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0082, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0096, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0155, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0124, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0110, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0132, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0092, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0115, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0138, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0095, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0093, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0087, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0106, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0080, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0016, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0098, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0087, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0100, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0096, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0091, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0155, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0013, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0084, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0093, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0129, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0082, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0085, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0020, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0113, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0087, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0134, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0024, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0109, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0093, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0107, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0012, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0092, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0160, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0015, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0127, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0121, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0080, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0096, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0101, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0083, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0086, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0170, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0089, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0017, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0082, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0085, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0151, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0091, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0125, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0022, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0085, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0017, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0121, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0137, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0103, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0101, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0143, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0098, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0104, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0085, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0119, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0090, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0099, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0019, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0022, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0095, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0126, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0102, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0107, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0089, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0016, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0018, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0097, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0114, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0111, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0014, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0091, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0084, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0094, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0091, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0095, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0096, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0088, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0107, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0111, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0128, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0081, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0093, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0094, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0103, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0024, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0087, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0115, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0081, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0092, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0090, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0087, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0014, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0090, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0135, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0127, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0230, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0008, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0021, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0104, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0083, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0082, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0082, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0082, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0104, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0128, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0103, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0149, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0202, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0156, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0011, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0135, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0134, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0090, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0097, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0115, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0096, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0024, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0190, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0110, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0080, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0101, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0109, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0126, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0120, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0089, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0141, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0098, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0023, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0016, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0019, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0022, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0086, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0087, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0095, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0084, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0090, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0134, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0095, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0085, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0022, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0110, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0088, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0126, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0112, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0121, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0086, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0190, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0080, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0097, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0021, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0022, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0095, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0090, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0012, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0022, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0020, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0081, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0098, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0107, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0088, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0123, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0095, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0097, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0109, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0007, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0083, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0080, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0111, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0090, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0099, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0012, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0016, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0139, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0119, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0151, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0108, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0084, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0085, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0110, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0098, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0102, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0100, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0133, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0137, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0008, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0128, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0093, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0016, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0017, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0145, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0097, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0084, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0128, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0094, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0091, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0157, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0106, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0090, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0020, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0084, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0018, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0164, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0166, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0088, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0121, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0021, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0096, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0107, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0095, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0022, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0106, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0086, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0145, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0006, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0099, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0084, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0129, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0020, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0023, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0021, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0125, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0125, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0117, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0118, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0015, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0186, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0090, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0019, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0104, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0085, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0254, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0086, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0095, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0105, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0090, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0128, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0082, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0106, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0114, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0014, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0020, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0090, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0097, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0085, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0083, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0108, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0107, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0102, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0159, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0016, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0089, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0086, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0016, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0094, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0112, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0086, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0084, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0134, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0092, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0091, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0138, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0080, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0022, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0085, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0105, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0124, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0089, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0131, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0086, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0086, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0102, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0020, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0144, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0099, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0022, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0105, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0258, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0143, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0138, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0080, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0245, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0118, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0130, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0093, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0097, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0082, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0145, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0123, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0103, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0084, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0088, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0125, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0024, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0132, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0018, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0088, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0016, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0096, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0105, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0128, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0097, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0011, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0125, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0094, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0088, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0108, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0082, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0090, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0020, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0093, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0022, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0138, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0024, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0123, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0116, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0098, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0205, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0133, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0023, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0105, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0140, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0085, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0081, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0082, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0140, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0160, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0103, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0115, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0106, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0084, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0131, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0161, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0019, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0088, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0085, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0091, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0084, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0133, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0009, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0127, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0120, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0019, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0108, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0137, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0094, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0117, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0140, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0082, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0080, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0118, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0129, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0106, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0120, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0081, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0155, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0134, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0085, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0082, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0093, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0137, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0090, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0024, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0188, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0093, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0108, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0095, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0115, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0112, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0148, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0117, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0160, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0116, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0095, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0099, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0134, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0012, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0085, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0022, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0130, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0200, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0123, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0122, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0177, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0196, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0014, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0082, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0015, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0203, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0092, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0119, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0003, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0088, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0023, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0008, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0096, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0089, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0134, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0083, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0100, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0016, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0115, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0130, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0147, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0127, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0124, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0007, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0018, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0081, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0130, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0016, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0096, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0114, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0009, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0090, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0134, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0094, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0098, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0090, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0145, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0117, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0095, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0015, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0107, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0129, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0116, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0081, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0022, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0087, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0097, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0010, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0088, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0085, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0103, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0093, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0092, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0085, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0090, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0085, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0126, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0021, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0082, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0009, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0126, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0087, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0015, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0101, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0134, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0020, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0104, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0116, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0093, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0107, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0094, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0021, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0095, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0109, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0092, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0124, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0016, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0108, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0217, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0022, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0119, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0115, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0088, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0090, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0105, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0121, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0194, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0080, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0098, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0017, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0096, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0094, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0022, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0023, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0091, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0080, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0017, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0019, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0014, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0174, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0101, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0166, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0154, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0084, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0017, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0129, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0024, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0097, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0098, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0097, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0155, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0106, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0086, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0087, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0142, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0165, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0129, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0099, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0165, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0024, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0109, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0103, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0133, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0106, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0093, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0197, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0011, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0108, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0131, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0096, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0128, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0098, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0129, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0017, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0180, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0207, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0083, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0089, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0022, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0084, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0093, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0137, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0113, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0086, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0118, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0110, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0101, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0085, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0103, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0091, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0103, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0126, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0126, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0105, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0014, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0023, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0129, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0104, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0126, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0134, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0122, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0022, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0096, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0022, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0022, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0017, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0082, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0106, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0087, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0126, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0114, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0086, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0118, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0126, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0085, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0128, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0088, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0081, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0085, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0023, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0162, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0024, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0086, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0087, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0115, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0111, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0019, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0098, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0123, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0081, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0112, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0080, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0020, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0086, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0094, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0116, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0106, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0011, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0021, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0118, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0080, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0083, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0086, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0105, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0015, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0014, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0103, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0120, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0020, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0080, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0016, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0120, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0130, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0019, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0096, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0095, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0085, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0088, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0112, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0188, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0107, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0156, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0019, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0090, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0100, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0097, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0018, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0172, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0010, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0098, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0163, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0017, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0024, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0017, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0110, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0093, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0115, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0147, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0100, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0133, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0023, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0098, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0121, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0198, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0098, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0101, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0105, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0087, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0023, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0108, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0123, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0108, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0022, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0096, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0088, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0165, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0105, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0013, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0130, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0087, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0121, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0085, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0013, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0080, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0094, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0008, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0116, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0020, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0097, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0121, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0101, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0105, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0097, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0120, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0165, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0130, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0106, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0024, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0123, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0085, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0087, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0125, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0125, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0024, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0107, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0117, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0083, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0123, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0111, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0097, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0088, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0096, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0088, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0084, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0093, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0022, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0109, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0010, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0123, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0009, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0021, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0024, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0092, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0022, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0175, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0107, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0097, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0090, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0080, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0100, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0021, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0103, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0119, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0144, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0099, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0087, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0084, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0023, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0086, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0101, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0017, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0127, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0125, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0018, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0171, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0014, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0089, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0023, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0095, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0080, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0133, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0015, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0102, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0106, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0023, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0120, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0083, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0098, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0110, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0084, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0109, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0106, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0091, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0101, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0100, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0150, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0099, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0146, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0120, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0130, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0114, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0126, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0017, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0121, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0132, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0020, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0090, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0014, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0102, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0095, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0098, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0084, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0132, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0109, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0024, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0124, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0131, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0081, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0104, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0082, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0113, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0135, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0109, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0017, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0136, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0167, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0081, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0114, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0089, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0132, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0013, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0009, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0084, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0116, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0188, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0096, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0114, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0116, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0096, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0021, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0117, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0010, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0114, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0093, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0018, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0083, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0135, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0021, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0096, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0114, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0100, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0086, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0107, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0111, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0015, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0129, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0132, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0095, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0168, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0094, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0088, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0023, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0091, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0094, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0109, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0106, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0096, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0113, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0181, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0020, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0126, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0080, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0123, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0090, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0094, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0104, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0103, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0097, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0085, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0095, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0093, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0100, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0081, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0096, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0083, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0194, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0162, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0099, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0124, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0024, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0143, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0113, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0121, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0098, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0094, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0083, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0111, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0126, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0080, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0021, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0084, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0082, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0122, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0085, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0126, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0192, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0142, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0101, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0113, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0108, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0085, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0107, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0139, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0102, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0104, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0089, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0109, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0131, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0213, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0082, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0124, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0084, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0082, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0084, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0085, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0130, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0107, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0088, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0080, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0125, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0105, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0152, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0124, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0114, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0126, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0096, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0132, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0094, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0135, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0089, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0098, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0110, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0121, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0141, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0083, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0106, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0018, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0088, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0118, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0107, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0115, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0129, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0118, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0149, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0111, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0112, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0080, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0096, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0019, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0086, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0106, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0097, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0014, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0096, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0008, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0087, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0109, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0022, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0124, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0085, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0024, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0110, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0136, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0104, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0152, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0088, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0022, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0080, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0129, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0087, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0094, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0016, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0166, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0108, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0104, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0097, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0096, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0120, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0110, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0097, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0103, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0115, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0021, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0136, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0106, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0092, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0080, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0129, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0105, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0106, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0134, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0107, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0089, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0097, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0130, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0085, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0024, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0126, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0110, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0105, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0080, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0092, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0092, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0100, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0088, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0119, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0106, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0084, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0083, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0088, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0092, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0024, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0081, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0020, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0020, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0134, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0120, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0110, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0024, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0094, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0023, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0091, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0123, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0088, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0098, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0114, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0113, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0146, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0086, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0124, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0083, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0086, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0099, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0100, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0024, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0084, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0080, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0158, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0016, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0096, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0103, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0099, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0015, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0084, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0132, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0017, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0119, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0088, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0141, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0093, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0084, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0082, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0100, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0091, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0022, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0135, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0086, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0117, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0088, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0142, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0139, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0083, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0099, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0019, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0091, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0021, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0129, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0102, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0098, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0095, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0163, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0092, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0019, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0103, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0133, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0109, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0100, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0138, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0019, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0105, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0110, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0100, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0122, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0096, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0110, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0095, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0117, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0096, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0101, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0088, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0093, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0085, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0126, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0127, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0100, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0120, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0024, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0099, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0131, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0120, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0113, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0115, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0080, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0148, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0021, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0149, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0083, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0098, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0129, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0100, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0082, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0129, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0132, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0021, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0090, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0101, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0084, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0089, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0102, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0087, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0110, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0141, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0100, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0105, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0122, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0088, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0085, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0169, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0121, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0126, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0086, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0098, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0086, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0155, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0118, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0024, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0022, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0092, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0096, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0091, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0100, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0103, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0021, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0084, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0087, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0011, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0018, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0089, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0020, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0128, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0092, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0113, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0019, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0110, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0086, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0120, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0081, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0095, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0024, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0113, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0118, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0097, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0086, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0139, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0080, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0164, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0097, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0102, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0114, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0083, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0112, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0083, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0097, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0024, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0095, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0083, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0080, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0098, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0023, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0122, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0103, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0107, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0022, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0091, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0139, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0099, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0115, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0180, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0092, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0008, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0087, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0019, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0082, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0081, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0012, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0118, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0104, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0086, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0090, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0089, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0101, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0106, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0118, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0173, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0095, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0171, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0130, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0088, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0020, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0091, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0081, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0098, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0095, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0109, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0212, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0094, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0083, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0098, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0088, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0123, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0161, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0147, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0096, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0087, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0095, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0124, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0081, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0162, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0097, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0083, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0087, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0086, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0084, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0017, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0156, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0148, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0103, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0099, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0101, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0126, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0104, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0086, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0140, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0106, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0104, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0021, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0087, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0086, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0148, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0087, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0160, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0122, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0134, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0081, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0095, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0021, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0106, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0017, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0085, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0092, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0092, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0131, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0110, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0015, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0141, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0111, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0145, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0133, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0113, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0087, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0098, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0096, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0108, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0019, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0021, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0097, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0015, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0090, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0084, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0080, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0023, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0106, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0095, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0117, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0161, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0103, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0020, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0086, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0123, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0096, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0115, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0114, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0081, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0126, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0089, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0091, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0083, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0108, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0163, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0139, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0133, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0096, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0024, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0092, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0015, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0086, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0088, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0092, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0102, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0089, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0086, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0083, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0098, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0022, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0090, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0108, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0168, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0088, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0092, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0110, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0110, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0105, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0013, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0118, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0090, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0081, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0022, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0124, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0081, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0089, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0113, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0143, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0097, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0108, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0090, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0098, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0105, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0144, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0191, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0083, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0090, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0017, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0174, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0135, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0162, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0114, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0132, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0115, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0091, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0092, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0101, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0087, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0083, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0135, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0093, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0126, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0149, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0109, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0082, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0130, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0085, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0097, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0080, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0024, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0136, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0115, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0023, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0085, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0090, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0107, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0105, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0022, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0084, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0017, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0080, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0013, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0103, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0095, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0169, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0117, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0093, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0105, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0086, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0092, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0117, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0088, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0123, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0023, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0017, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0118, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0093, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0100, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0106, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0099, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0017, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0016, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0019, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0182, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0120, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0129, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0095, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0097, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0203, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0086, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0110, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0112, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0099, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0134, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0146, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0088, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0104, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0099, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0125, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0084, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0132, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0093, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0088, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0020, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0102, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0091, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0117, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0098, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0094, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0092, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0128, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0116, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0106, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0084, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0095, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0100, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0109, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0108, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0152, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0110, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0084, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0100, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0107, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0086, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0088, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0023, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0080, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0012, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0102, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0093, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0146, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0081, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0113, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0166, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0107, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0128, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0103, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0103, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0080, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0140, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0087, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0132, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0099, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0112, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0089, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0110, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0024, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0108, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0136, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0110, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0086, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0014, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0086, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0014, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0118, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0010, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0099, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0115, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0134, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0130, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0024, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0021, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0084, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0100, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0117, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0089, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0092, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0110, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0015, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0014, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0019, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0127, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0102, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0096, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0024, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0095, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0166, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0101, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0095, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0095, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0020, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0092, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0014, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0018, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0153, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0132, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0018, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0016, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0014, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0089, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0084, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0024, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0103, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0082, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0019, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0014, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0094, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0106, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0130, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0022, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0105, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0114, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0021, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0023, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0100, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0087, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0088, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0151, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0101, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0120, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0024, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0114, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0089, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0101, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0100, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0020, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0101, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0102, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0080, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0085, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0097, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0153, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0100, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0017, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0099, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0085, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0083, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0107, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0100, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0104, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0133, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0096, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0085, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0008, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0114, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0088, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0107, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0088, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0016, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0085, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0100, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0024, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0090, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0088, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0098, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0121, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0149, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0016, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0111, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0122, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0081, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0102, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0162, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0094, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0102, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0098, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0117, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0096, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0118, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0088, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0128, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0023, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0101, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0020, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0089, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0105, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0080, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0092, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0021, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0114, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0092, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0123, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0093, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0091, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0094, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0105, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0102, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0112, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0110, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0024, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0103, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0022, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0022, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0155, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0013, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0083, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0140, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0118, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0013, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0085, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0091, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0121, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0083, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0107, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0020, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0088, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0082, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0109, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0091, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0093, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0093, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0117, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0019, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0113, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0113, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0096, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0083, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0181, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0105, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0156, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0113, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0088, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0092, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0156, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0093, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0088, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0022, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0125, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0144, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0103, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0098, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0087, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0093, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0123, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0092, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0094, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0096, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0098, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0087, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0103, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0099, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0080, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0130, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0012, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0024, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0022, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0011, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0012, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0178, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0135, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0206, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0081, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0085, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0089, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0022, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0093, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0015, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0082, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0089, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0018, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0102, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0163, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0097, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0141, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0024, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0089, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0019, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0018, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0136, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0086, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0116, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0088, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0090, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0097, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0080, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0099, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0107, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0097, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0096, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0106, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0088, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0109, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0111, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0103, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0088, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0016, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0081, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0081, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0089, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0109, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0086, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0023, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0136, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0099, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0087, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0119, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0084, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0109, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0096, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0090, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0090, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0016, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0021, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0130, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0023, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0082, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0105, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0081, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0081, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0108, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0139, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0086, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0022, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0150, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0158, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0087, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0089, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0086, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0094, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0017, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0082, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0087, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0097, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0091, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0017, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0190, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0104, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0015, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0114, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0086, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0089, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0021, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0086, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0089, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0102, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0081, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0102, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0199, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0114, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0080, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0080, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0082, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0107, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0024, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0087, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0118, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0085, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0085, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0111, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0099, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0124, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0121, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0085, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0023, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0023, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0082, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0085, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0105, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0085, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0122, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0090, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0097, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0081, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0095, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0091, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0023, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0111, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0083, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0144, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0112, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0117, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0091, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0139, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0083, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0014, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0082, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0138, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0118, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0109, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0024, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0090, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0020, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0095, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0088, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0014, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0090, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0104, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0091, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0086, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0179, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0101, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0082, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0122, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0087, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0116, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0096, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0097, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0090, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0103, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0098, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0022, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0083, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0024, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0161, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0082, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0130, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0087, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0132, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0097, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0089, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0019, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0083, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0171, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0104, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0021, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0114, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0097, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0086, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0024, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0023, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0091, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0121, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0098, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0112, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0117, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0133, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0120, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0099, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0144, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0117, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0083, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0083, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0086, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0093, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0100, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0094, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0087, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0116, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0021, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0080, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0098, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0101, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0128, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0156, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0081, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0110, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0106, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0158, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0082, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0097, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0128, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0084, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0086, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0089, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0103, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0096, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0109, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0087, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0127, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0113, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0085, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0022, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0123, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0099, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0102, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0083, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0020, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0084, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0083, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0092, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0149, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0015, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0080, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0086, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0089, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0015, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0174, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0116, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0122, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0080, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0020, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0110, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0021, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0092, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0010, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0099, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0017, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0006, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0021, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0148, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0024, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0115, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0101, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0109, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0092, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0081, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0094, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0115, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0096, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0089, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0100, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0102, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0112, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0112, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0086, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0081, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0101, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0110, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0082, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0098, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0106, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0115, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0111, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0021, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0095, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0109, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0090, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0149, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0118, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0124, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0087, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0085, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0098, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0097, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0085, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0118, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0134, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0108, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0096, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0106, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0134, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0080, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0109, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0113, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0099, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0080, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0117, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0119, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0122, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0085, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0127, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0132, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0081, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0131, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0019, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0098, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0136, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0099, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0096, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0111, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0086, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0088, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0094, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0120, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0089, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0122, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0086, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0092, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0021, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0089, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0095, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0125, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0018, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0087, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0096, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0142, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0085, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0108, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0103, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0019, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0019, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0086, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0111, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0100, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0102, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0018, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0110, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0089, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0137, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0108, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0123, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0103, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0097, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0024, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0083, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0092, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0109, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0021, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0089, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0100, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0091, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0014, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0126, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0113, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0092, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0099, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0018, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0137, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0093, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0016, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0021, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0113, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0130, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0098, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0086, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0102, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0021, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0097, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0097, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0020, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0100, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0114, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0086, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0088, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0087, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0090, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0024, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0114, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0023, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0085, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0018, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0116, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0085, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0146, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0101, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0080, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0114, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0129, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0010, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0105, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0121, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0083, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0084, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0144, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0108, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0109, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0085, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0104, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0130, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0189, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0107, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0129, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0142, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0101, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0118, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0022, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0080, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0108, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0153, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0091, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0150, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0080, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0141, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0011, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0087, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0123, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0087, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0108, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0116, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0083, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0017, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0018, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0024, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0085, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0083, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0016, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0090, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0118, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0100, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0019, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0023, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0087, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0019, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0081, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0019, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0097, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0116, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0092, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0101, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0082, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0085, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0024, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0021, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0084, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0110, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0105, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0092, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0020, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0084, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0020, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0139, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0088, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0093, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0090, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0023, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0016, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0143, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0019, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0129, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0111, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0111, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0155, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0098, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0090, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0011, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0105, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0085, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0086, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0087, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0089, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0104, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0133, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0022, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0014, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0097, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0114, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0187, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0109, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0093, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0114, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0124, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0024, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0011, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0124, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0094, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0082, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0083, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0016, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0008, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0080, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0105, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0082, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0098, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0085, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0098, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0103, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0122, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0111, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0125, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0082, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0022, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0009, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0020, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0021, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0096, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0083, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0024, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0090, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0096, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0102, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0101, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0096, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0116, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0123, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0090, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0099, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0101, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0107, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0086, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0125, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0094, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0119, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0102, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0082, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0152, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0094, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0089, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0011, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0080, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0087, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0084, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0159, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0087, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0023, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0110, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0178, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0126, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0084, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0087, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0154, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0083, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0108, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0103, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0109, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0095, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0022, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0094, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0009, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0081, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0080, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0023, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0014, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0094, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0098, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0147, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0019, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0088, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0080, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0085, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0094, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0125, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0102, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0123, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0081, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0097, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0132, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0089, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0106, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0143, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0102, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0016, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0105, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0108, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0024, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0093, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0094, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0158, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0024, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0086, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0120, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0092, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0011, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0020, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0081, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0086, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0132, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0101, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0119, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0092, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0094, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0094, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0024, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0088, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0106, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0121, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0013, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0092, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0085, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0090, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0142, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0094, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0081, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0166, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0174, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0103, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0090, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0023, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0087, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0123, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0093, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0123, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0107, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0135, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0135, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0110, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0089, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0102, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0089, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0097, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0022, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0100, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0102, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0092, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0101, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0086, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0023, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0094, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0087, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0121, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0092, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0083, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0019, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0084, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0116, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0092, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0095, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0111, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0080, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0084, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0134, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0139, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0138, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0146, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0082, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0100, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0015, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0020, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0106, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0130, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0088, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0109, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0084, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0116, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0137, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0118, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0107, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0095, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0085, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0099, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0100, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0109, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0096, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0085, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0099, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0098, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0214, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0107, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0120, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0090, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0097, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0086, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0023, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0139, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0099, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0114, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0084, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0147, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0012, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0017, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0097, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0092, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0132, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0112, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0103, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0087, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0081, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0093, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0109, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0129, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0085, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0119, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0101, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0085, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0101, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0088, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0110, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0150, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0084, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0088, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0103, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0081, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0022, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0083, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0084, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0091, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0094, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0093, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0097, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0127, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0129, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0087, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0102, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0125, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0095, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0092, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0024, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0010, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0118, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0080, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0135, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0081, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0082, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0089, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0119, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0085, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0017, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0109, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0081, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0104, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0103, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0120, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0083, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0099, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0096, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0099, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0093, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0097, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0156, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0101, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0087, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0105, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0105, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0085, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0096, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0206, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0105, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0103, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0109, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0080, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0085, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0083, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0123, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0090, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0115, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0081, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0085, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0091, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0081, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0022, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0105, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0020, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0100, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0080, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0125, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0104, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0088, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0090, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0087, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0096, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0131, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0094, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0080, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0099, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0161, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0151, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0085, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0086, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0087, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0084, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0089, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0094, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0101, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0126, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0181, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0104, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0087, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0018, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0012, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0020, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0014, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0023, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0124, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0082, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0107, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0120, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0080, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0113, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0017, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0091, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0082, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0108, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0102, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0117, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0022, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0083, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0022, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0086, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0122, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0023, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0086, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0101, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0090, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0125, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0084, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0018, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0023, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0092, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0084, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0133, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0100, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0087, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0101, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0007, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0104, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0012, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0168, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0018, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0148, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0098, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0123, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0111, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0106, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0084, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0094, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0137, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0017, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0088, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0084, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0022, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0193, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0022, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0087, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0109, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0092, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0103, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0101, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0080, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0083, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0093, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0089, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0102, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0019, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0114, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0021, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0086, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0103, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0140, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0085, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0106, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0118, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0091, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0129, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0021, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0125, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0020, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0104, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0083, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0010, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0023, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0137, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0102, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0093, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0136, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0083, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0088, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0138, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0125, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0022, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0136, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0090, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0120, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0099, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0083, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0098, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0083, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0107, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0106, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0151, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0129, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0087, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0115, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0109, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0088, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0092, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0092, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0174, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0114, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0113, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0086, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0082, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0023, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0008, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0092, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0101, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0104, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0111, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0114, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0121, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0080, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0082, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0095, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0089, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0113, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0018, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0142, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0080, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0018, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0093, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0121, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0090, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0081, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0013, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0009, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0096, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0093, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0092, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0102, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0022, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0093, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0106, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0088, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0093, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0094, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0023, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0087, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0081, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0136, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0084, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0125, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0156, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0021, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0096, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0081, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0018, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0101, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0107, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0144, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0016, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0022, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0081, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0114, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0088, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0111, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0141, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0020, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0094, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0080, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0144, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0086, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0019, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0098, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0106, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0006, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0080, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0109, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0085, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0112, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0135, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0110, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0084, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0088, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0080, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0100, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0084, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0107, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0090, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0099, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0098, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0116, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0097, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0014, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0109, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0115, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0094, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0088, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0110, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0098, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0130, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0145, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0105, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0091, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0100, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0081, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0089, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0154, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0153, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0122, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0088, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0105, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0020, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0081, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0121, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0082, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0151, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0096, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0087, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0082, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0093, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0095, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0142, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0163, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0094, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0119, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0081, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0106, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0102, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0091, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0094, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0089, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0092, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0018, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0116, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0112, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0104, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0085, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0080, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0087, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0122, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0081, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0080, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0015, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0091, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0107, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0108, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0103, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0015, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0091, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0091, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0128, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0104, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0019, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0091, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0083, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0133, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0022, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0080, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0089, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0102, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0080, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0116, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0129, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0143, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0086, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0140, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0022, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0084, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0021, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0101, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0095, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0094, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0133, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0111, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0096, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0105, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0120, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0120, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0134, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0080, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0089, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0096, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0106, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0014, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0095, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0085, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0015, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0113, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0012, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0080, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0110, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0121, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0081, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0109, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0100, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0098, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0019, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0081, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0087, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0143, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0102, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0095, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0086, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0106, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0102, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0080, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0087, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0085, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0085, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0127, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0017, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0020, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0101, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0087, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0017, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0083, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0099, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0087, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0107, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0088, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0013, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0098, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0012, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0153, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0081, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0107, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0086, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0177, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0156, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0109, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0082, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0097, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0081, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0112, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0106, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0145, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0131, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0106, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0080, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0021, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0020, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0092, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0087, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0084, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0140, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0092, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0089, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0016, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0091, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0020, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0083, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0099, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0084, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0097, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0082, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0095, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0016, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0082, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0094, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0112, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0080, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0099, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0020, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0104, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0007, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0089, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0093, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0023, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0104, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0081, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0091, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0142, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0162, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0080, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0081, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0085, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0090, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0087, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0115, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0101, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0089, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0085, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0134, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0112, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0018, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0096, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0095, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0085, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0092, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0090, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0110, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0022, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0015, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0022, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0109, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0100, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0087, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0092, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0089, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0018, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0117, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0082, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0017, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0017, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0110, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0088, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0094, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0100, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0116, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0085, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0102, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0010, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0136, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0099, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0023, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0020, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0100, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0024, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0086, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0024, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0118, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0130, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0104, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0021, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0151, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0120, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0022, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0093, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0014, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0159, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0023, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0109, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0086, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0119, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0097, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0134, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0081, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0088, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0087, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0092, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0122, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0103, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0092, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0088, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0084, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0081, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0103, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0082, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0024, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0101, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0115, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0084, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0138, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0082, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0024, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0155, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0089, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0096, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0081, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0087, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0091, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0135, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0137, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0021, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0092, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0085, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0142, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0139, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0080, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0019, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0019, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0086, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0092, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0093, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0101, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0105, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0087, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0081, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0127, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0022, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0097, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0021, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0016, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0084, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0113, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0113, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0015, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0095, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0019, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0015, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0018, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0096, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0023, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0120, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0099, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0081, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0018, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0024, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0092, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0095, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0112, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0107, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0021, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0116, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0102, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0022, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0112, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0087, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0101, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0023, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0155, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0117, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0093, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0080, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0017, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0093, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0103, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0080, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0019, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0136, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0083, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0084, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0088, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0086, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0092, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0101, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0016, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0118, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0094, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0024, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0113, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0081, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0012, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0016, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0014, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0091, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0081, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0023, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0093, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0112, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0021, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0097, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0082, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0127, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0087, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0089, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0181, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0094, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0084, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0018, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0080, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0084, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0104, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0097, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0169, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0118, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0024, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0021, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0104, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0019, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0019, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0084, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0096, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0080, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0107, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0013, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0157, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0084, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0107, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0024, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0087, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0086, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0151, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0087, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0086, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0018, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0021, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0090, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0099, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0094, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0097, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0139, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0121, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0082, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0099, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0101, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0094, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0082, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0136, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0024, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0013, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0130, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0006, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0020, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0099, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0097, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0022, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0084, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0080, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0110, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0114, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0020, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0095, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0105, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0021, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0091, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0100, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0126, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0103, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0105, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0097, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0087, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0006, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0083, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0082, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0099, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0013, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0113, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0089, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0007, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0126, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0087, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0023, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0021, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0018, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0020, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0094, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0094, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0107, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0094, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0016, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0023, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0087, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0106, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0091, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0103, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0095, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0101, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0100, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0108, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0083, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0087, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0081, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0088, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0091, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0136, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0118, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0121, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0085, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0146, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0020, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0092, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0101, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0080, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0091, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0081, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0094, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0022, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0082, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0110, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0084, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0134, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0108, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0109, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0127, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0096, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0092, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0006, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0024, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0103, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0023, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0130, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0098, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0019, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0084, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0101, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0111, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0089, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0016, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0013, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0094, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0084, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0098, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0131, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0097, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0103, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0020, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0022, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0020, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0023, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0086, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0098, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0021, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0018, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0114, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0093, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0010, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0150, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0116, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0089, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0103, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0016, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0097, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0095, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0018, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0089, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0089, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0081, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0101, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0091, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0126, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0017, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0125, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0099, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0091, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0022, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0014, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0099, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0017, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0087, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0115, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0023, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0020, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0090, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0145, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0081, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0140, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0094, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0092, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0014, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0080, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0019, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0014, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0107, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0082, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0018, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0023, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0102, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0120, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0023, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0003, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0003, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0003, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0003, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0003, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0003, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0003, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0003, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0003, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0003, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0003, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0003, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0003, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0003, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0003, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0003, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0002, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0002, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0006, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0125, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0020, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0096, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0114, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0014, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0084, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0097, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0097, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0107, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0101, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0083, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0082, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0088, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0111, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0095, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0096, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0102, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0024, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0081, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0094, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0023, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0091, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0085, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0084, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0010, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0083, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0015, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0112, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0024, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0100, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0091, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0021, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0092, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0080, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0103, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0091, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0087, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0089, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0091, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0022, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0023, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0105, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0023, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0096, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0084, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0022, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0116, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0102, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0015, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0128, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0082, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0107, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0097, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0086, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0154, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0021, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0015, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0092, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0109, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0097, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0088, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0012, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0155, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0133, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0153, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0092, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0018, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0110, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0017, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0168, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0113, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0088, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0008, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0022, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0189, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0135, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0084, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0013, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0096, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0080, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0102, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0130, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0093, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0015, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0122, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0095, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0091, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0081, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0023, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0024, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0109, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0023, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0083, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0018, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0084, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0110, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0018, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0081, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0021, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0023, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0084, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0021, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0118, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0087, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0020, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0110, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0091, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0024, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0024, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0023, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0086, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0110, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0018, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0024, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0106, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0091, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0140, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0024, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0088, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0092, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0130, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0087, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0089, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0024, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0012, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0080, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0083, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0092, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0080, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0086, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0086, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0104, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0093, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0024, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0022, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0089, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0022, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0147, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0098, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0020, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0089, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0090, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0108, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0008, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0010, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0093, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0019, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0020, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0020, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0088, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0105, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0117, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0011, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0080, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0107, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0087, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0016, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0110, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0023, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0022, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0159, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0094, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0016, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0098, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0082, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0178, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0094, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0019, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0014, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0114, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0092, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0162, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0006, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0088, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0122, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0108, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0082, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0089, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0182, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0086, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0084, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0096, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0101, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0081, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0092, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0086, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0017, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0091, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0021, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0080, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0086, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0110, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0106, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0112, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0087, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0019, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0096, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0100, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0097, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0019, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0106, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0006, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0084, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0012, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0012, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0094, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0084, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0086, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0022, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0082, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0093, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0089, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0019, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0155, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0085, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0018, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0114, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0087, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0082, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0105, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0085, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0016, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0022, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0112, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0009, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0021, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0121, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0104, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0023, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0023, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0089, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0021, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0112, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0008, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0023, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0096, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0019, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0119, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0019, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0093, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0087, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0086, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0084, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0103, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0110, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0082, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0019, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0080, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0091, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0081, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0083, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0149, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0009, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0095, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0022, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0010, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0092, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0098, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0024, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0101, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0021, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0081, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0024, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0082, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0085, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0020, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0089, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0014, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0082, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0098, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0023, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0102, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0080, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0015, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0115, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0024, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0024, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0106, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0086, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0096, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0080, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0010, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0013, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0014, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0080, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0163, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0018, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0007, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0020, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0126, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0089, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0097, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0024, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0024, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0087, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0094, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0083, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0014, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0022, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0022, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0012, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0016, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0083, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0100, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0022, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0089, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0099, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0086, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0098, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0099, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0012, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0023, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0132, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0103, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0084, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0112, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0024, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0122, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0111, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0024, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0104, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0111, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0020, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0153, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0086, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0014, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0012, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0013, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0082, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0104, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0111, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0112, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0111, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0090, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0113, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0092, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0098, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0009, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0021, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0098, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0093, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0088, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0090, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0101, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0102, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0080, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0017, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0021, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0087, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0019, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0083, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0113, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0102, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0080, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0023, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0022, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0104, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0019, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0091, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0107, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0081, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0136, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0024, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0143, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0100, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0024, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0100, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0117, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0088, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0093, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0020, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0023, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0116, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0102, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0020, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0086, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0109, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0019, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0104, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0082, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0186, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0119, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0085, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0091, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0115, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0088, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0021, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0017, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0022, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0021, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0082, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0086, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0149, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0089, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0090, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0088, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0011, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0119, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0119, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0018, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0022, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0083, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0083, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0091, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0096, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0116, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0082, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0084, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0089, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0115, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0081, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0100, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0024, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0099, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0090, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0021, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0021, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0110, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0091, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0095, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0015, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0021, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0099, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0081, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0099, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0014, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0096, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0084, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0116, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0142, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0018, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0099, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0109, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0082, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0131, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0096, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0086, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0024, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0116, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0084, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0083, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0144, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0130, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0098, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0019, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0094, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0082, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0100, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0141, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0024, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0015, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0084, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0014, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0086, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0085, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0186, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0095, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0087, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0023, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0017, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0102, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0139, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0127, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0021, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0123, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0145, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0020, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0012, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0021, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0195, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0096, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0106, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0116, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0099, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0089, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0022, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0192, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0094, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0097, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0102, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0131, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0012, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0100, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0022, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0024, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0024, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0128, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0085, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0014, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0127, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0116, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0113, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0089, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0020, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0092, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0083, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0126, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0023, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0102, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0100, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0111, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0015, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0105, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0016, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0012, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0115, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0109, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0111, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0083, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0085, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0021, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0088, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0109, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0092, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0105, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0099, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0081, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0101, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0121, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0098, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0138, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0093, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0125, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0122, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0104, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0173, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0131, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0114, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0120, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0098, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0015, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0019, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0010, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0098, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0115, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0088, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0015, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0088, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0083, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0162, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0100, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0017, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0128, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0094, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0115, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0170, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0024, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0088, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0168, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0113, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0080, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0022, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0132, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0087, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0019, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0089, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0099, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0083, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0089, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0101, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0095, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0003, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0013, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0098, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0084, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0115, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0105, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0093, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0017, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0085, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0096, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0096, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0088, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0020, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0024, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Saving model at iteration: 20000
Loss: tensor(0.0106, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0111, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0024, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0158, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0082, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0013, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0022, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0089, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0121, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0022, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0117, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0115, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0018, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0100, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0093, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0021, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0097, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0018, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0088, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0105, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0084, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0022, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0088, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0013, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0085, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0104, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0088, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0014, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0083, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0014, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0083, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0101, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0023, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0019, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0024, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0081, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0080, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0100, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0102, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0123, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0024, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0022, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0082, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0102, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0013, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0020, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0106, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0097, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0014, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0082, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0083, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0096, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0127, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0149, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0128, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0082, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0024, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0086, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0083, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0087, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0083, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0122, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0121, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0007, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0093, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0122, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0014, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0083, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0080, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0016, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0084, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0112, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0089, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0080, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0014, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0096, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0084, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0104, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0112, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0095, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0092, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0095, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0098, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0093, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0080, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0080, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0090, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0090, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0141, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0083, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0080, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0127, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0098, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0093, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0103, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0080, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0105, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0082, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0176, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0095, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0080, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0083, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0113, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0020, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0155, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0096, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0023, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0093, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0111, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0088, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0111, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0088, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0103, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0097, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0104, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0142, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0089, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0094, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0106, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0024, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0134, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0151, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0133, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0097, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0093, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0091, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0133, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0122, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0121, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0150, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0125, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0095, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0083, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0023, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0103, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0019, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0085, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0015, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0110, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0089, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0126, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0150, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0085, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0219, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0016, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0086, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0092, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0088, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0094, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0022, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0082, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0091, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0149, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0088, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0092, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0087, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0098, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0097, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0093, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0099, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0088, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0024, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0098, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0087, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0088, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0015, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0022, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0092, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0099, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0082, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0024, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0018, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0080, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0019, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0010, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0124, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0024, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0094, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0021, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0092, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0089, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0080, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0083, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0096, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0013, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0087, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0111, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0109, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0090, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0106, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0107, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0080, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0093, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0017, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0104, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0021, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0103, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0092, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0086, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0123, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0092, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0102, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0019, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0092, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0088, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0021, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0081, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0088, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0102, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0083, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0020, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0104, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0088, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0087, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0118, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0126, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0081, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0113, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0023, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0105, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0161, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0087, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0088, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0021, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0082, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0019, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0082, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0105, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0013, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0146, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0124, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0096, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0092, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0022, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0088, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0094, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0084, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0095, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0015, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0110, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0086, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0089, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0082, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0083, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0095, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0011, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0024, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0086, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0090, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0083, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0096, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0020, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0083, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0114, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0084, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0019, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0081, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0080, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0087, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0080, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0081, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0107, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0090, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0108, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0133, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0023, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0017, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0080, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0096, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0104, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0080, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0098, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0122, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0107, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0095, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0081, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0094, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0082, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0088, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0022, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0017, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0018, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0094, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0088, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0119, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0021, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0082, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0085, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0089, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0020, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0097, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0084, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0096, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0016, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0102, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0102, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0118, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0017, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0104, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0115, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0089, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0082, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0020, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0102, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0022, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0150, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0091, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0122, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0112, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0138, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0097, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0086, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0083, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0086, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0101, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0008, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0081, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0021, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0105, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0085, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0080, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0024, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0100, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0024, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0129, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0099, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0110, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0020, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0115, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0022, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0090, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0108, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0092, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0085, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0019, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0084, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0017, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0082, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0104, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0014, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0106, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0092, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0112, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0096, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0021, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0146, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0086, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0121, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0089, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0090, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0083, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0094, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0018, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0085, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0095, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0089, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0081, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0012, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0013, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0099, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0120, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0024, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0083, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0007, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0089, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0018, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0135, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0022, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0090, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0013, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0088, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0024, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0086, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0084, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0114, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0081, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0089, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0170, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0023, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0008, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0104, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0096, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0092, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0155, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0022, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0097, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0099, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0140, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0094, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0087, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0015, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0013, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0086, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0080, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0091, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0138, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0135, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0018, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0081, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0083, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0024, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0177, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0082, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0091, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0024, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0138, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0086, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0106, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0083, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0115, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0017, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0020, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0009, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0024, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0127, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0100, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0119, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0021, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0099, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0128, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0126, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0086, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0013, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0022, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0093, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0092, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0115, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0084, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0114, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0114, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0086, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0121, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0105, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0113, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0107, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0088, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0006, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0085, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0090, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0080, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0012, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0097, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0080, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0008, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0084, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0005, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0087, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0143, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0097, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0142, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0083, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0024, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0089, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0112, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0137, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0088, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0023, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0110, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0023, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0021, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0082, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0105, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0024, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0011, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0090, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0102, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0095, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0024, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0138, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0095, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0095, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0021, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0023, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0022, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0085, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0108, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0104, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0127, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0095, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0022, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0086, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0083, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0089, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0083, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0127, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0016, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0114, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0022, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0097, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0116, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0086, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0109, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0138, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0102, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0097, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0093, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0008, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0097, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0019, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0104, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0115, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0097, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0111, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0080, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0024, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0020, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0127, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0118, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0018, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0088, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0080, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0134, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0098, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0106, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0101, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0021, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0115, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0152, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0017, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0016, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0002, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0011, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0092, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0148, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0089, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0019, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0092, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0092, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0082, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0128, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0020, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0105, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0094, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0022, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0013, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0088, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0089, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0093, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0024, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0086, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0104, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0019, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0101, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0106, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0013, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0017, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0095, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0088, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0091, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0024, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0012, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0126, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0110, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0085, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0014, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0084, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0086, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0020, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0124, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0085, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0024, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0134, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0086, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0024, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0014, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0021, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0080, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0018, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0117, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0024, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0014, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0084, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0080, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0020, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0006, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0020, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0081, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0135, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0013, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0104, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0023, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0094, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0094, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0082, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0085, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0087, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0098, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0024, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0085, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0101, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0080, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0080, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0086, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0006, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0082, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0097, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0096, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0024, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0094, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0096, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0024, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0009, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0096, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0127, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0101, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0107, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0106, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0016, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0021, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0019, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0081, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0085, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0085, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0124, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0082, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0082, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0100, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0100, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0082, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0108, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0020, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0102, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0080, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0107, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0020, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0022, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0142, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0089, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0088, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0083, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0088, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0023, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0099, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0080, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0014, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0080, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0096, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0157, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0087, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0107, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0081, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0120, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0019, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0024, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0023, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0098, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0098, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0087, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0084, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0095, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0123, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0116, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0081, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0013, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0019, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0019, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0009, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0121, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0119, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0017, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0146, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0022, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0024, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0096, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0124, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0108, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0010, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0013, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0021, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0020, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0118, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0017, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0084, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0018, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0091, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0097, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0023, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0107, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0088, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0086, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0022, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0020, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0080, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0008, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0006, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0103, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0104, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0112, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0080, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0021, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0092, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0089, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0096, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0114, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0081, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0115, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0087, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0125, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0023, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0110, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0022, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0090, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0180, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0082, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0143, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0098, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0106, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0081, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0103, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0125, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0095, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0108, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0151, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0023, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0101, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0152, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0086, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0120, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0082, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0081, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0133, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0097, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0179, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0198, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0118, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0091, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0021, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0080, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0017, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0013, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0091, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0095, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0102, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0082, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0137, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0083, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0020, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0085, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0100, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0119, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0112, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0125, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0098, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0108, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0093, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0087, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0122, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0016, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0106, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0089, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0114, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0080, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0109, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0085, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0089, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0108, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0092, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0093, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0017, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0086, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0022, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0150, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0104, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0081, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0095, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0018, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0159, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0084, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0085, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0083, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0119, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0106, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0017, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0019, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0080, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0131, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0110, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0093, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0089, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0024, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0082, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0090, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0082, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0022, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0082, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0097, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0097, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0093, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0094, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0089, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0088, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0153, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0115, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0112, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0124, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0127, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0095, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0085, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0081, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0096, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0084, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0083, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0104, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0186, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0111, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0091, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0125, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0103, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0018, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0101, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0136, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0021, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0090, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0115, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0102, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0091, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0153, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0023, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0112, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0081, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0083, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0014, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0087, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0017, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0114, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0013, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0132, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0092, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0024, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0089, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0024, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0022, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0081, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0086, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0080, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0015, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0107, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0124, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0093, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0089, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0020, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0118, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0086, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0083, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0097, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0085, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0017, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0083, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0083, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0112, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0024, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0022, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0113, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0082, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0091, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0110, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0106, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0132, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0086, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0023, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0097, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0134, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0090, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0120, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0082, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0093, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0086, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0110, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0111, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0086, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0123, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0022, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0087, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0104, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0106, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0017, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0089, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0130, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0115, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0084, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0097, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0126, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0086, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0021, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0085, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0089, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0104, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0086, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0087, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0103, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0110, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0020, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0098, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0154, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0089, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0086, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0017, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0088, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0134, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0082, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0092, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0090, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0023, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0114, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0113, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0083, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0022, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0096, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0084, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0131, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0096, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0091, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0020, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0092, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0083, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0023, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0080, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0021, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0094, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0022, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0099, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0022, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0122, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0098, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0081, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0081, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0135, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0090, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0139, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0144, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0102, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0106, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0023, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0122, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0101, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0094, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0103, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0088, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0084, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0082, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0095, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0084, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0021, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0090, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0083, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0109, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0132, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0103, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0016, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0022, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0018, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0084, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0084, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0091, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0089, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0095, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0085, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0083, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0083, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0103, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0019, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0104, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0089, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0140, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0083, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0122, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0094, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0085, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0093, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0021, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0022, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0156, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0013, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0162, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0080, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0108, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0081, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0132, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0093, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0018, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0085, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0107, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0083, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0097, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0115, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0095, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0080, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0128, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0136, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0084, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0114, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0115, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0023, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0125, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0100, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0095, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0091, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0016, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0112, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0131, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0080, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0110, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0016, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0097, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0123, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0097, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0107, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0101, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0092, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0125, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0103, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0019, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0110, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0095, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0138, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0095, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0018, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0121, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0128, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0019, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0106, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0138, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0087, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0116, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0099, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0109, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0089, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0141, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0093, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0151, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0084, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0114, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0116, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0096, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0015, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0131, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0125, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0102, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0023, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0084, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0082, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0109, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0105, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0081, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0086, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0081, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0111, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0019, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0121, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0105, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0018, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0122, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0167, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0092, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0083, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0089, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0125, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0093, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0098, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0109, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0014, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0132, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0090, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0114, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0096, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0019, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0087, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0088, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0021, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0086, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0088, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0107, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0101, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0081, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0018, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0015, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0091, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0116, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0010, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0152, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0019, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0133, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0145, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0090, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0109, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0107, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0081, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0125, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0080, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0090, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0024, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0134, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0086, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0124, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0110, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0115, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0087, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0091, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0082, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0093, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0091, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0081, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0090, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0020, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0019, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0091, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0196, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0122, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0107, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0016, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0020, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0098, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0103, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0101, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0092, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0017, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0105, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0164, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0081, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0016, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0091, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0106, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0109, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0100, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0092, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0093, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0097, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0117, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0091, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0121, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0109, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0105, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0081, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0013, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0085, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0092, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0080, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0098, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0094, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0085, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0023, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0101, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0128, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0093, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0113, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0116, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0116, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0089, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0106, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0099, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0084, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0103, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0112, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0095, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0094, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0094, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0094, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0022, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0119, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0091, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0013, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0106, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0011, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0100, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0127, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0146, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0085, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0115, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0116, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0101, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0103, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0153, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0094, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0112, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0134, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0094, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0096, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0016, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0091, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0120, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0122, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0107, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0136, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0094, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0105, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0096, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0090, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0088, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0105, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0167, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0112, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0084, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0080, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0018, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0022, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0144, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0020, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0092, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0192, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0114, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0022, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0100, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0093, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0105, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0090, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0024, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0121, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0128, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0103, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0122, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0145, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0087, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0105, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0097, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0094, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0082, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0090, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0122, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0118, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0148, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0016, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0100, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0152, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0087, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0105, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0113, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0117, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0101, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0109, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0094, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0092, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0099, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0125, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0094, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0118, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0081, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0081, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0080, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0088, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0112, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0146, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0091, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0082, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0122, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0112, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0086, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0082, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0020, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0118, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0096, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0013, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0083, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0114, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0116, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0133, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0019, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0082, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0105, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0119, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0091, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0146, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0131, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0105, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0089, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0098, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0103, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0012, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0106, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0091, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0115, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0081, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0090, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0103, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0104, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0103, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0083, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0089, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0108, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0086, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0023, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0017, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0092, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0024, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0093, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0110, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0086, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0095, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0133, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0097, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0101, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0081, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0096, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0112, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0092, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0106, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0016, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0106, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0087, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0082, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0139, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0087, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0095, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0101, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0089, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0080, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0109, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0086, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0017, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0087, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0106, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0139, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0106, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0111, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0084, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0083, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0019, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0085, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0111, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0117, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0084, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0096, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0018, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0088, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0114, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0109, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0123, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0087, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0085, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0118, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0127, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0103, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0086, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0097, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0096, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0133, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0090, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0124, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0016, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0022, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0138, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0097, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0110, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0113, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0093, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0145, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0080, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0087, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0014, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0087, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0089, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0081, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0101, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0094, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0120, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0089, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0094, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0099, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0082, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0092, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0141, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0109, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0098, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0013, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0022, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0107, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0095, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0095, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0089, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0092, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0081, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0094, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0109, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0083, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0087, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0107, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0137, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0081, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0085, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0141, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0146, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0130, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0114, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0089, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0088, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0126, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0122, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0153, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0150, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0111, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0098, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0083, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0082, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0024, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0102, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0113, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0106, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0106, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0080, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0111, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0085, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0109, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0103, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0082, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0099, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0159, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0115, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0089, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0095, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0021, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0087, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0082, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0131, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0087, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0086, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0116, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0137, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0092, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0105, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0080, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0092, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0088, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0095, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0112, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0087, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0102, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0104, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0105, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0023, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0022, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0092, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0163, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0023, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0117, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0024, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0106, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0016, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0021, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0146, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0109, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0016, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0024, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0016, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0119, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0099, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0012, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0092, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0124, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0018, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0105, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0092, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0024, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0091, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0082, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0086, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0119, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0091, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0089, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0102, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0010, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0129, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0108, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0087, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0122, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0108, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0092, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0023, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0098, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0119, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0024, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0013, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0100, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0084, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0098, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0097, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0080, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0014, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0104, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0023, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0105, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0105, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0115, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0017, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0085, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0080, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0082, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0023, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0120, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0019, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0017, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0020, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0022, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0013, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0103, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0106, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0019, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0081, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0104, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0012, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0086, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0108, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0080, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0014, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0087, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0017, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0021, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0014, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0085, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0019, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0098, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0099, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0080, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0085, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0083, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0084, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0020, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0086, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0080, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0006, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0016, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0110, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0017, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0024, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0017, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0119, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0024, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0021, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0103, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0019, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0089, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0093, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0087, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0087, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0085, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0086, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0087, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0019, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0087, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0095, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0107, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0020, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0017, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0016, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0082, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0024, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0013, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0022, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0119, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0021, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0084, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0085, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0088, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0014, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0084, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0023, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0022, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0013, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0022, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0019, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0024, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0023, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0016, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0006, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0023, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0016, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0113, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0129, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0083, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0108, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0081, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0024, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0018, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0016, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0019, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0087, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0011, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0096, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0021, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0017, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0023, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0012, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0022, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0024, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0018, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0018, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0102, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0024, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0083, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0096, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0018, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0024, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0134, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0019, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0110, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0022, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0021, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0021, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0106, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0087, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0109, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0085, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0082, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0086, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0085, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0017, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0099, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0125, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0016, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0096, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0080, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0023, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0100, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0022, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0090, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0017, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0137, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0104, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0100, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0023, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0016, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0019, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0021, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0021, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0081, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0014, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0084, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0020, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0022, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0018, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0009, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0131, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0022, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0010, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0017, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0109, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0021, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0100, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0133, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0101, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0020, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0022, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0081, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0020, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0018, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0023, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0099, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0007, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0094, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0021, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0093, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0105, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0091, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0083, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0023, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0017, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0024, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0127, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0022, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0115, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0022, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0020, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0087, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0020, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0095, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0016, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0121, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0085, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0096, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0081, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0081, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0020, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0134, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0014, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0118, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0021, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0086, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0101, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0085, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0091, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0101, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0018, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0109, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0112, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0087, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0024, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0107, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0088, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0085, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0095, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0015, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0082, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0023, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0080, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0085, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0124, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0013, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0104, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0129, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0096, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0017, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0080, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0022, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0015, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0085, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0021, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0111, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0093, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0094, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0082, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0126, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0021, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0022, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0085, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0111, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0095, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0112, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0021, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0009, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0101, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0099, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0081, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0081, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0011, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0090, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0015, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0007, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0017, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0155, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0020, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0097, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0084, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0080, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0083, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0019, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0023, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0122, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0124, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0016, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0022, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0106, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0020, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0022, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0082, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0080, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0021, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0090, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0085, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0094, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0100, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0105, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0109, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0018, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0023, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0110, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0019, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0009, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0095, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0088, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0103, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0083, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0021, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0091, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0095, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0017, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0125, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0019, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0024, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0024, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0084, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0091, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0093, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0022, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0109, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0080, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0024, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0016, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0100, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0094, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0106, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0142, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0024, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0082, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0080, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0092, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0091, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0091, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0015, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0096, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0135, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0095, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0100, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0123, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0088, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0022, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0118, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0024, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0019, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0111, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0104, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0093, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0101, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0094, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0023, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0085, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0021, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0141, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0117, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0094, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0021, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0085, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0093, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0091, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0023, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0021, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0080, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0100, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0082, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0019, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0024, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0021, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0100, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0024, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0096, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0087, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0096, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0084, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0021, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0092, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0108, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0091, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0087, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0022, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0023, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0102, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0023, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0024, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
--------------Restarting: Beginning EPOCH 7-------------
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Test Accuracy: 0.9601149838303988
Loss: tensor(0.0123, device='cuda:0', grad_fn=<NllLossBackward>)
Saving model at iteration: 0
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0119, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0104, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0186, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0140, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0205, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0106, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0128, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0021, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0114, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0096, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0137, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0110, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0121, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0112, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0169, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0160, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0099, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0183, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0141, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0108, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0105, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0120, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0140, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0155, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0113, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0215, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0096, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0162, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0135, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0177, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0114, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0164, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0227, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0117, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0184, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0080, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0176, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0018, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0012, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0022, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0121, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0009, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0199, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0114, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0090, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0113, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0014, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0119, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0122, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0159, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0095, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0003, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0021, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0020, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0111, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0003, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0003, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0106, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0220, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0116, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0099, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0020, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0159, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0122, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0003, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0139, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0097, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0127, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0003, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0019, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0131, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0007, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0150, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0110, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0109, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0014, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0003, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0003, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0007, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0112, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0100, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0099, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0007, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0007, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0093, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0110, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0003, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0007, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0003, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0006, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0084, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0022, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0003, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0123, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0143, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0115, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0131, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0006, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0086, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0097, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0014, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0002, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0002, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0088, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0142, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0171, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0014, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0006, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0002, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0002, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0002, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0002, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0144, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0132, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0117, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0087, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0018, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0024, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0002, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0113, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0005, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0084, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0002, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0180, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0002, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0002, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0005, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0094, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0005, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0121, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0164, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0168, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0095, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0020, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0104, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0199, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0195, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0161, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0092, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0125, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0093, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0019, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0010, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0002, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0235, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0101, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0108, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0093, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0002, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0023, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0151, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0195, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0153, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0082, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0017, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0083, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0101, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0092, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0095, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0136, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0156, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0018, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0085, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0135, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0148, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0094, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0102, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0092, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0109, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0133, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0091, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0125, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0115, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0115, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0144, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0099, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0087, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0136, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0212, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0175, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0096, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0084, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0106, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0183, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0081, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0012, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0154, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0121, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0126, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0146, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0099, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0127, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0019, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0205, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0093, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0010, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0133, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0102, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0107, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0127, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0080, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0180, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0015, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0141, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0106, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0098, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0096, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0162, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0092, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0137, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0115, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0093, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0100, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0093, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0172, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0017, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0112, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0017, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0108, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0173, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0086, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0143, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0210, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0082, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0095, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0173, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0173, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0155, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0081, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0082, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0102, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0090, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0151, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0015, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0155, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0107, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0117, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0129, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0163, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0152, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0091, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0082, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0116, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0121, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0175, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0181, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0082, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0087, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0139, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0013, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0100, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0023, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0103, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0088, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0082, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0135, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0156, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0080, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0092, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0152, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0024, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0157, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0131, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0019, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0207, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0179, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0120, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0106, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0081, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0114, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0189, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0090, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0122, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0167, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0094, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0112, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0101, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0112, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0141, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0140, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0082, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0117, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0221, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0110, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0203, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0193, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0097, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0086, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0092, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0019, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0091, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0105, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0107, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0020, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0006, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0080, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0022, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0084, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0024, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0126, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0020, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0114, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0122, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0138, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0145, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0116, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0013, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0139, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0085, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0083, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0116, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0117, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0117, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0098, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0123, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0089, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0080, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0118, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0107, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0083, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0012, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0092, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0149, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0181, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0130, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0086, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0110, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0107, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0003, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0095, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0010, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0092, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0118, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0081, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0122, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0144, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0164, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0010, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0012, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0115, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0137, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0138, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0018, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0014, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0147, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0176, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0082, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0113, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0166, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0098, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0126, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0021, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0089, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0094, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0004, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0089, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0150, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0133, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0126, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0101, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0013, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0024, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0135, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0091, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0116, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0004, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0082, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0094, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0102, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0023, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0092, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0110, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0010, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0095, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0163, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0118, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0105, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0010, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0129, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0101, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0113, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0178, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0101, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0134, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0122, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0102, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0092, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0089, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0018, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0111, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0143, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0154, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0015, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0123, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0108, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0092, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0017, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0097, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0204, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0006, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0099, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0080, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0011, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0016, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0106, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0118, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0104, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0116, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0021, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0129, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0108, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0012, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0158, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0129, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0128, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0112, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0084, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0117, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0145, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0023, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0097, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0142, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0095, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0104, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0148, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0105, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0011, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0188, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0021, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0202, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0137, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0080, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0092, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0102, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0103, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0218, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0122, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0104, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0124, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0084, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0157, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0117, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0090, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0137, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0145, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0080, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0186, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0083, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0116, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0138, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0198, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0100, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0095, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0023, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0085, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0024, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0091, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0080, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0165, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0210, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0135, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0098, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0021, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0157, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0007, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0024, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0130, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0020, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0124, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0024, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0080, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0097, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0087, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0161, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0015, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0185, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0023, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0023, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0186, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0081, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0102, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0015, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0111, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0016, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0011, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0081, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0006, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0114, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0021, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0112, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0100, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0105, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0014, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0124, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0258, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0089, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0083, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0149, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0080, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0094, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0013, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0018, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0146, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0138, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0091, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0098, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0005, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0110, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0128, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0019, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0141, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0107, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0003, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0096, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0112, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0083, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0194, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0103, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0085, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0127, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0089, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0127, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0019, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0007, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0125, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0130, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0087, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0123, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0112, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0104, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0087, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0082, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0092, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0132, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0166, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0105, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0091, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0104, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0015, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0118, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0101, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0091, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0104, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0123, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0144, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0132, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0120, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0019, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0159, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0143, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0126, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0127, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0091, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0088, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0091, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0184, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0124, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0090, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0003, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0171, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0093, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0110, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0093, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0138, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0122, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0127, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0126, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0011, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0121, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0176, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0166, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0136, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0106, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0022, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0105, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0162, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0112, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0091, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0165, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0155, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0139, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0120, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0117, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0124, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0110, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0137, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0081, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0165, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0099, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0091, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0206, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0107, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0101, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0020, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0090, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0093, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0125, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0142, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0087, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0138, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0086, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0082, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0100, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0006, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0121, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0109, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0170, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0117, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0136, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0088, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0105, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0085, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0087, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0104, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0158, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0119, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0145, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0173, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0158, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0103, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0089, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0140, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0113, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0171, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0094, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0092, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0132, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0145, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0086, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0136, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0018, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0116, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0022, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0108, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0018, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0151, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0018, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0023, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0149, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0082, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0144, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0122, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0114, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0082, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0137, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0195, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0142, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0140, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0009, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0085, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0121, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0131, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0131, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0021, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0128, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0141, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0014, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0206, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0007, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0138, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0021, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0081, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0195, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0127, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0008, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0012, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0097, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0118, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0023, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0023, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0018, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0100, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0118, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0024, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0146, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0089, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0173, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0099, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0021, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0146, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0097, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0006, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0006, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0147, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0098, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0100, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0138, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0088, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0015, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0148, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0131, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0135, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0016, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0010, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0089, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0086, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0020, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0111, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0140, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0104, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0085, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0140, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0152, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0097, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0109, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0170, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0111, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0020, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0136, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0086, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0009, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0159, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0088, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0365, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0113, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0021, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0097, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0119, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0011, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0099, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0091, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0002, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0114, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0103, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0086, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0103, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0127, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0022, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0137, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0089, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0128, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0090, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0002, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0022, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0141, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0083, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0094, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0106, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0006, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0092, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0158, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0107, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0022, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0131, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0018, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0089, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0009, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0121, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0108, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0120, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0083, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0198, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0140, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0163, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0168, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0095, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0128, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0011, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0130, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0141, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0018, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0102, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0148, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0122, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0100, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0123, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0108, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0088, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0135, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0223, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0082, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0109, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0098, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0098, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0094, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0121, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0172, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0124, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0129, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0159, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0093, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0144, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0122, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0129, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0101, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0142, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0104, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0014, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0009, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0086, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0094, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0130, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0081, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0163, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0009, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0016, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0099, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0021, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0176, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0009, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0108, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0112, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0142, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0113, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0111, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0008, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0010, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0093, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0121, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0080, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0083, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0112, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0121, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0145, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0019, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0105, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0196, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0082, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0136, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0102, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0084, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0087, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0112, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0139, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0146, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0128, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0158, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0086, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0015, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0166, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0004, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0120, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0020, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0113, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0129, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0009, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0011, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0023, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0110, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0128, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0100, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0168, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0165, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0020, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0150, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0002, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0092, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0124, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0211, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0120, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0138, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0021, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0093, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0084, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0099, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0015, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0081, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0152, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0172, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0087, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0117, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0115, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0006, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0093, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0114, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0111, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0143, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0024, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0020, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0118, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0022, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0004, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0024, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0019, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0002, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0106, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0152, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0118, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0151, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0103, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0087, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0130, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0013, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0087, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0108, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0096, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0018, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0082, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0012, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0159, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0102, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0100, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0096, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0020, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0119, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0112, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0003, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0126, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0021, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0161, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0114, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0019, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0086, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0145, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0090, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0145, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0021, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0162, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0202, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0140, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0089, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0083, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0081, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0097, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0008, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0022, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0113, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0007, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0088, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0085, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0082, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0104, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0122, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0014, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0019, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0109, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0113, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0105, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0103, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0103, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0128, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0011, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0163, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0022, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0087, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0105, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0088, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0136, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0092, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0132, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0140, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0022, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0134, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0103, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0108, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0014, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0130, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0099, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0085, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0098, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0239, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0094, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0015, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0168, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0195, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0081, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0095, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0094, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0013, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0097, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0084, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0105, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0011, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0001, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0083, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0153, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0145, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0148, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0108, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0129, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0091, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0018, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0095, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0002, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0095, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0001, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0006, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0087, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0148, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0086, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0083, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0087, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0119, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0117, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0171, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0007, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0098, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0101, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0110, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0008, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0119, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0086, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0123, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0098, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0020, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0121, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0108, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0004, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0127, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0081, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0101, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0109, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0094, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0024, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0102, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0130, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0084, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0170, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0010, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0091, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0181, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0143, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0144, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0084, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0112, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0095, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0095, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0149, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0127, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0015, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0094, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0110, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0113, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0102, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0111, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0024, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0024, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0142, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0101, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0100, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0089, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0080, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0120, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0085, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0162, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0141, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0130, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0114, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0091, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0109, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0148, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0189, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0085, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0005, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0144, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0083, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0012, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0145, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0157, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0091, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0023, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0240, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0081, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0004, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0155, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0091, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0087, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0096, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0015, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0019, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0024, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0081, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0111, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0119, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0092, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0188, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0014, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0013, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0116, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0080, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0093, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0023, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0137, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0129, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0226, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0099, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0170, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0245, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0137, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0118, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0019, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0112, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0020, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0099, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0146, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0089, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0091, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0116, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0095, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0007, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0129, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0010, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0102, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0175, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0154, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0087, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0107, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0106, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0020, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0085, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0154, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0102, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0138, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0100, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0119, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0091, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0019, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0125, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0174, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0131, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0014, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0022, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0110, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0090, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0019, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0118, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0015, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0017, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0018, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0159, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0122, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0109, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0111, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0090, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0009, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0086, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0098, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0093, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0113, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0091, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0012, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0003, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0088, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0081, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0081, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0088, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0087, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0087, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0019, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0097, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0101, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0122, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0090, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0009, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0105, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0087, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0138, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0098, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0121, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0108, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0115, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0127, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0128, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0107, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0094, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0015, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0096, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0020, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0118, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0148, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0016, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0107, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0091, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0156, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0166, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0118, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0099, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0129, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0101, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0084, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0019, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0108, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0135, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0091, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0142, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0017, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0013, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0014, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0118, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0019, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0022, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0017, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0094, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0020, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0018, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0109, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0093, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0087, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0141, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0017, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0147, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0103, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0019, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0021, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0021, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0094, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0009, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0112, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0020, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0011, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0012, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0022, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0195, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0103, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0087, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0149, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0103, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0080, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0108, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0132, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0013, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0146, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0002, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0171, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0223, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0021, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0020, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0083, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0116, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0087, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0099, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0004, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0023, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0143, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0013, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0131, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0188, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0099, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0120, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0090, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0197, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0164, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0108, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0102, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0207, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0153, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0165, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0098, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0112, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0081, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0111, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0214, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0020, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0151, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0095, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0129, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0087, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0111, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0024, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0085, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0165, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0081, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0110, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0102, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0124, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0146, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0117, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0119, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0082, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0100, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0015, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0080, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0010, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0148, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0111, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0084, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0084, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0095, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0083, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0087, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0024, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0125, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0007, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0082, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0119, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0084, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0130, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0086, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0010, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0110, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0100, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0086, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0099, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0089, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0083, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0110, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0088, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0087, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0088, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0084, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0023, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0015, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0022, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0135, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0013, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0002, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0174, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0005, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0140, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0083, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0178, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0004, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0100, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0130, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0108, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0006, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0132, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0146, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0148, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0015, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0101, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0012, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0097, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0085, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0015, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0021, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0095, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0167, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0190, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0108, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0084, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0008, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0171, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0127, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0099, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0101, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0123, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0018, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0127, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0135, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0096, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0023, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0116, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0087, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0104, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0152, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0094, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0089, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0080, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0107, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0102, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0115, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0019, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0084, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0116, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0109, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0097, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0133, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0017, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0012, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0114, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0091, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0176, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0083, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0006, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0023, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0086, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0096, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0163, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0098, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0128, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0018, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0005, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0140, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0171, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0185, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0007, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0123, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0113, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0103, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0124, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0139, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0151, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0115, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0018, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0082, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0085, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0145, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0135, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0136, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0015, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0018, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0083, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0099, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0101, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0142, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0089, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0099, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0133, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0173, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0154, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0016, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0107, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0145, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0110, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0100, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0107, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0297, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0085, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0011, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0092, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0080, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0086, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0127, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0020, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0117, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0016, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0137, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0006, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0109, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0024, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0108, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0159, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0089, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0195, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0127, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0108, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0141, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0099, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0084, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0098, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0002, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0091, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0018, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0103, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0021, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0083, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0089, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0021, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0080, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0092, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0112, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0085, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0007, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0021, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0134, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0003, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0137, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0126, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0092, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0087, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0161, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0100, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0099, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0023, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0124, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0019, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0088, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0083, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0082, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0006, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0007, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0106, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0211, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0120, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0083, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0142, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0087, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0105, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0130, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0017, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0091, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0098, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0092, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0100, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0139, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0100, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0100, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0023, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0103, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0206, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0004, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0137, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0091, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0081, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0009, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0138, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0158, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0102, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0012, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0089, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0097, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0092, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0087, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0087, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0183, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0137, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0092, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0012, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0111, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0095, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0019, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0081, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0145, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0014, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0115, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0113, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0015, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0124, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0083, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0125, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0110, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0090, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0129, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0126, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0086, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0006, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0083, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0010, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0101, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0103, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0023, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0155, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0112, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0177, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0222, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0018, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0089, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0119, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0005, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0156, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0235, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0082, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0171, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0084, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0085, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0097, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0110, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0022, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0108, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0098, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0084, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0127, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0092, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0109, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0021, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0016, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0111, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0005, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0144, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0148, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0102, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0170, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0024, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0081, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0004, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0086, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0093, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0221, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0017, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0116, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0085, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0013, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0003, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0087, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0177, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0145, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0084, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0081, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0023, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0021, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0097, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0021, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0182, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0151, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0089, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0081, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0155, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0117, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0105, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0088, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0081, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0092, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0086, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0081, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0139, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0126, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0091, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0208, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0197, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0181, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0014, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0101, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0142, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0024, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0022, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0090, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0103, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0024, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0119, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0081, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0097, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0114, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0091, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0020, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0083, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0084, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0131, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0008, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0013, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0095, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0086, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0017, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0086, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0172, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0165, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0147, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0164, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0087, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0098, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0011, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0122, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0013, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0125, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0104, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0164, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0131, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0085, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0080, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0119, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0121, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0142, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0101, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0015, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0090, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0109, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0114, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0095, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0023, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0115, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0096, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0024, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0154, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0088, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0082, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0082, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0099, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0087, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0116, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0086, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0096, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0126, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0082, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0144, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0008, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0086, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0149, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0002, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0080, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0096, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0110, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0083, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0091, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0100, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0011, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0023, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0115, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0117, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0208, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0114, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0134, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0089, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0132, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0165, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0127, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0083, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0085, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0130, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0121, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0213, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0017, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0178, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0013, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0147, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0021, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0080, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0084, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0097, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0080, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0095, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0122, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0093, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0115, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0188, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0003, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0022, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0102, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0016, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0134, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0090, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0100, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0204, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0087, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0093, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0112, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0150, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0024, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0111, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0095, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0169, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0081, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0159, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0024, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0148, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0117, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0145, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0138, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0159, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0137, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0119, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0247, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0109, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0156, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0023, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0129, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0111, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0023, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0095, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0101, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0020, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0263, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0106, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0140, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0088, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0099, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0119, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0119, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0133, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0136, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0175, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0017, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0002, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0139, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0100, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0014, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0148, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0082, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0190, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0121, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0094, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0100, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0089, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0146, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0110, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0132, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0123, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0087, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0151, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0016, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0114, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0177, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0104, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0099, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0173, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0024, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0197, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0126, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0084, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0167, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0023, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0146, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0023, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0110, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0154, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0102, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0023, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0088, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0119, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0085, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0097, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0102, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0102, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0133, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0114, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0099, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0007, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0013, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0085, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0101, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0015, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0014, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0119, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0085, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0169, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0107, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0096, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0092, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0089, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0080, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0080, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0111, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0099, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0085, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0109, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0024, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0097, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0128, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0131, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0129, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0113, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0084, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0106, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0103, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0091, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0131, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0088, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0094, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0010, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0095, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0013, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0130, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0080, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0109, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0117, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0100, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0015, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0119, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0011, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0010, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0083, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0082, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0005, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0020, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0013, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0166, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0092, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0011, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0021, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0158, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0090, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0150, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0009, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0009, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0091, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0019, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0121, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0181, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0023, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0017, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0133, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0156, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0014, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0135, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0011, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0144, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0127, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0085, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0082, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0145, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0101, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0167, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0003, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0097, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0129, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0102, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0085, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0160, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0022, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0116, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0012, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0145, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0107, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0082, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0111, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0099, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0099, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0127, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0093, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0160, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0082, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0091, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0165, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0015, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0102, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0097, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0013, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0081, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0018, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0109, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0080, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0017, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0100, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0022, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0020, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0022, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0130, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0170, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0097, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0121, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0090, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0009, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0091, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0107, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0008, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0097, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0024, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0118, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0093, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0112, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0110, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0112, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0008, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0132, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0148, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0120, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0116, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0097, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0110, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0112, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0088, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0002, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0009, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0153, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0138, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0241, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0153, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0084, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0004, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0016, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0207, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0219, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0110, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0136, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0023, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0106, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0131, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0211, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0087, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0082, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0102, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0101, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0094, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0097, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0015, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0091, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0114, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0086, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0090, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0124, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0013, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0149, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0098, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0132, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0103, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0014, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0006, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0103, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0108, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0087, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0096, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0021, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0022, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0083, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0021, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0084, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0142, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0014, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0018, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0100, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0104, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0123, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0091, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0001, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0018, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0095, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0100, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0080, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0091, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0087, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0092, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0143, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0097, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0023, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0147, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0082, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0089, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0151, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0098, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0021, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0094, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0013, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0163, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0137, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0129, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0088, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0136, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0100, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0125, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0100, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0102, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0022, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0122, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0089, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0091, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0096, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0024, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0088, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0110, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0019, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0158, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0080, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0024, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0007, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0199, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0133, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0104, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0108, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0023, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0017, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0008, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0091, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0112, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0196, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0099, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0080, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0148, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0114, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0023, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0011, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0092, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0135, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0109, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0125, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0100, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0087, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0130, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0121, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0104, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0099, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0133, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0109, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0101, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0080, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0107, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0127, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0107, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0024, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0023, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0125, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0114, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0082, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0237, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0189, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0082, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0194, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0081, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0001, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0115, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0113, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0003, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0100, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0127, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0015, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0091, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0107, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0087, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0088, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0151, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0154, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0108, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0024, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0129, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0133, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0006, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0082, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0129, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0023, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0115, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0165, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0122, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0115, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0147, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0024, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0011, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0167, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0089, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0110, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0103, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0023, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0090, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0115, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0016, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0162, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0023, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0018, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0091, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0089, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0082, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0002, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0091, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0020, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0004, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0132, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0117, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0017, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0016, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0139, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0132, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0017, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0131, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0016, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0171, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0093, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0020, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0148, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0097, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0134, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0006, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0119, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0089, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0157, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0166, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0085, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0144, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0094, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0128, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0081, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0012, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0111, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0115, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0128, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0085, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0009, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0091, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0102, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0083, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0086, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0193, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0088, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0092, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0108, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0018, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0088, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0018, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0097, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0093, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0160, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0095, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0108, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0104, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0096, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0134, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0007, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0009, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0137, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0090, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0128, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0128, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0108, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0018, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0185, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0107, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0020, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0023, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0088, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0086, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0214, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0125, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0151, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0094, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0082, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0197, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0094, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0100, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0010, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0010, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0110, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0003, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0205, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0004, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0016, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0143, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0108, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0015, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0113, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0094, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0173, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0111, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0003, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0099, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0169, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0100, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0137, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0142, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0132, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0162, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0127, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0200, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0014, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0017, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0003, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0163, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0128, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0112, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0082, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0110, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0111, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0143, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0133, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0022, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0093, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0116, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0115, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0090, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0018, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0119, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0108, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0124, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0102, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0105, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0015, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0160, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0115, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0086, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0097, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0024, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0103, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0084, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0023, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0080, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0123, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0091, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0089, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0100, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0016, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0007, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0002, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0103, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0020, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0125, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0109, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0200, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0022, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0099, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0143, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0006, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0180, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0011, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0022, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0097, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0002, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0127, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0024, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0080, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0092, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0093, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0125, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0097, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0019, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0107, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0202, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0106, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0015, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0159, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0015, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0102, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0081, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0109, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0020, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0212, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0080, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0082, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0134, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0111, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0107, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0107, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0176, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0095, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0095, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0125, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0086, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0118, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0016, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0017, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0125, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0099, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0105, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0109, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0110, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0188, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0119, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0092, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0100, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0096, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0090, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0085, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0019, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0021, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0160, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0131, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0085, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0135, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0145, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0083, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0096, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0128, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0002, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0133, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0091, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(8.7562e-05, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0148, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0098, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0082, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0085, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0182, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0082, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0083, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0097, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0111, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0014, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0098, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0024, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0209, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0023, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0114, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0082, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0093, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0016, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0091, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(8.5312e-05, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0004, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0091, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0164, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0127, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0158, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0160, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0101, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0107, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0015, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0007, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0134, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0121, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(8.7208e-05, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0019, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0193, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0234, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0018, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0006, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0094, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0086, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0006, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0127, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0119, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0010, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0087, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0084, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0001, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0087, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0093, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0104, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0013, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(9.9829e-05, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0107, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0099, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(9.1980e-05, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0091, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0110, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0080, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0114, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0087, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0007, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0095, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0171, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0021, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0081, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0087, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0119, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0021, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0084, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0085, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0102, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0167, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0016, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0085, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0095, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0094, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0171, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0013, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0093, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0088, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0114, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0084, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0080, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0104, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0086, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0005, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0080, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0083, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0158, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0021, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0093, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0095, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0117, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0099, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0083, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0108, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0081, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0131, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0141, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0162, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0097, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0133, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0088, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0017, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0089, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0085, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0099, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0111, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0167, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0023, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0101, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0087, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0083, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0090, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0104, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0093, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0103, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0095, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0004, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0140, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0086, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0152, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0117, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0003, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0014, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0097, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0157, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0096, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0086, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0105, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0125, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0019, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0119, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0085, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0117, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0161, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0097, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0080, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0147, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0097, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0092, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0023, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0093, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0123, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0019, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0096, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0088, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0020, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0127, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0097, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0104, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0085, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0005, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0113, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0110, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0100, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0113, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0087, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0141, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0148, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0098, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0013, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0112, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0167, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0096, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0169, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0010, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0023, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0152, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0094, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0018, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0119, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0081, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0119, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0117, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0103, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0083, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0107, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0094, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0016, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0018, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0089, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0159, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0097, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0002, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0109, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0113, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0084, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0106, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0103, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0145, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0115, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0017, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0159, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0094, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0122, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0023, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0113, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0014, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0110, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0103, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0092, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0124, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0117, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0163, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0126, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0107, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0020, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0083, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0092, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0105, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0085, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0089, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0081, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0088, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0110, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0082, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0140, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0083, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0017, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0017, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0098, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0019, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0115, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0087, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0087, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0118, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0097, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0128, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0007, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0173, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0096, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0119, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0081, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0125, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0134, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0095, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0093, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0098, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0115, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0165, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0197, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0097, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0125, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0081, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0090, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0088, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0146, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0133, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0085, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0138, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0101, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0005, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0083, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0117, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0096, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0092, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0085, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0024, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0101, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0093, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0096, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0134, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0131, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0097, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0020, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0017, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0094, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0007, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0125, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0119, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0203, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0112, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0133, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0108, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0116, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0105, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0125, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0188, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0080, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0094, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0102, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0148, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0125, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0023, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0123, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0110, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0173, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0137, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0014, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0086, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0093, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0019, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0136, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0080, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0131, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0010, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0095, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0089, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0119, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0104, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0138, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0147, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0085, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0158, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0013, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0117, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0097, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0116, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0086, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0103, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0129, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0084, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0095, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0089, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0123, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0167, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0097, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0110, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0091, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0084, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0081, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0094, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0114, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0088, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0104, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0016, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0093, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0024, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0088, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0102, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0169, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0111, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0020, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0086, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0088, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0104, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0013, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0103, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0087, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0086, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0134, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0018, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0023, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0112, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0084, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0095, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0110, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0017, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0104, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0108, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0020, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0132, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0093, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0085, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0107, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0121, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0080, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0105, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0087, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0083, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0122, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0111, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0083, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0083, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0117, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0141, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0102, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0127, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0085, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0022, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0119, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0106, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0081, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0127, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0106, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0009, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0158, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0094, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0082, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0084, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0134, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0021, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0100, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0173, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0111, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0102, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0119, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0091, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0083, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0100, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0081, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0088, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0017, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0097, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0093, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0088, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0081, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0133, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0107, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0121, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0103, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0082, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0022, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0120, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0097, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0100, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0095, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0139, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0023, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0099, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0088, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0149, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0109, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0218, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0108, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0091, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0095, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0021, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0109, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0082, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0107, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0013, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0166, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0146, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0098, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0096, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0085, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0005, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0017, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0102, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0136, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0111, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0124, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0086, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0132, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0119, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0132, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0017, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0017, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0013, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0101, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0093, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0023, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0092, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0102, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0112, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0101, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0112, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0114, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0022, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0152, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0157, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0119, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0156, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0121, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0099, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0016, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0092, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0080, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0219, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0088, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0110, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0147, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0084, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0166, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0106, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0158, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0116, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0096, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0108, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0093, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0022, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0090, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0022, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0097, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0114, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0105, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0139, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0131, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0130, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0085, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0099, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0094, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0107, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0113, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0082, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0099, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0112, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0096, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0080, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0024, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0085, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0244, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0086, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0147, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0124, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0100, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0135, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0099, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0114, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0024, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0086, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0083, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0110, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0093, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0017, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0157, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0005, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0121, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0122, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0142, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0146, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0090, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0014, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0148, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0095, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0114, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0080, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0084, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0094, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0146, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0141, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0015, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0024, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0112, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0087, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0024, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0081, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0082, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0091, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0103, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0135, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0166, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0091, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0093, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0022, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0102, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0139, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0107, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0126, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0105, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0082, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0133, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0020, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0120, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0129, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0105, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0096, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0097, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0014, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0113, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0081, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0159, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0100, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0089, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0096, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0136, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0087, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0017, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0103, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0112, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0102, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0094, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0089, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0097, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0162, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0089, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0086, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0098, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0088, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0102, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0084, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0017, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0080, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0108, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0121, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0111, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0091, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0135, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0082, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0100, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0124, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0135, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0097, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0081, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0128, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0020, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0097, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0101, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0022, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0145, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0127, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0133, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0113, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0021, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0161, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0091, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0104, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0085, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0012, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0105, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0102, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0020, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0197, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0081, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0083, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0024, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0137, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0085, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0018, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0017, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0119, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0119, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0109, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0082, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0122, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0012, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0152, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0015, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0088, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0096, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0094, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0113, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0098, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0014, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0104, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0085, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0096, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0124, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0105, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0082, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0116, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0118, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0125, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0136, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0083, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0111, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0120, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0023, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0096, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0017, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0023, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0113, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0117, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0024, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0085, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0183, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0107, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0087, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0008, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0086, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0081, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0153, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0111, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0163, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0165, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0090, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0020, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0080, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0017, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0103, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0086, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0174, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0007, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0136, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0144, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0082, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0023, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0178, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0121, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0023, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0084, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0017, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0161, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0101, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0116, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0082, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0128, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0024, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0101, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0124, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0082, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0006, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0119, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0080, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0089, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0094, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0083, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0015, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0099, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0088, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0116, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0150, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0115, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0091, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0116, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0121, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0125, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0105, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0116, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0109, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0094, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0101, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0097, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0095, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0084, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0101, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0083, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0122, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0158, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0093, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0023, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0085, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0084, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0111, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0009, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0095, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0130, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0160, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0175, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0107, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0115, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0105, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0133, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0104, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0150, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0083, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0139, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0115, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0096, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0083, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0144, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0107, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0127, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0161, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0081, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0099, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0113, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0092, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0105, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0005, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0100, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0012, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0089, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0098, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0084, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0093, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0021, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0085, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0084, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0102, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0115, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0118, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0092, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0092, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0085, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0170, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0084, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0087, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0012, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0115, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0086, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0083, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0087, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0021, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0133, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0113, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0090, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0104, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0093, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0089, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0090, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0094, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0084, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0121, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0015, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0097, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0152, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0096, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0091, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0155, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0014, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0168, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0092, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0183, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0022, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0012, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0152, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0140, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0011, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0089, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0082, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0086, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0084, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0100, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0083, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0115, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0100, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0088, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0089, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0112, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0085, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0113, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0122, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0122, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0004, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0085, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0086, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0172, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0088, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0019, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0114, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0131, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0092, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0091, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0085, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0018, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0149, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0010, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0084, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0160, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0092, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0081, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0089, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0083, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0285, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0089, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0093, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0128, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0083, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0088, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0021, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0093, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0120, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0098, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0129, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0102, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0023, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0082, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0127, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0120, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0133, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0148, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0114, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0082, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0098, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0124, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0100, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0106, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0087, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0081, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0140, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0016, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0088, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0106, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0023, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0090, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0023, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0014, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0022, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0023, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0081, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0117, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0021, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0084, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0086, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0081, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0127, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0019, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0013, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0019, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0013, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0109, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0146, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0116, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0082, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0101, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0102, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0080, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0084, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0020, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0090, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0181, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0096, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0117, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0023, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0018, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0019, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0106, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0081, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0122, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0020, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0126, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0162, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0106, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0080, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0092, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0090, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0098, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0023, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0103, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0017, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0106, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0083, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0090, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0084, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0011, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0153, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0082, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0008, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0009, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0106, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0112, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0013, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0126, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0113, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0142, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0143, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0105, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0104, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0108, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0097, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0169, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0127, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0103, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0103, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0093, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0095, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0084, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0099, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0102, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0080, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0082, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0148, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0083, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0105, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0022, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0107, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0115, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0104, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0092, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0080, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0106, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0099, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0105, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0009, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0151, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0024, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0093, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0018, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0173, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0128, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0118, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0103, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0096, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0111, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0091, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0088, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0093, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0089, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0089, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0090, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0105, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0103, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0174, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0107, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0089, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0113, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0083, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0123, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0081, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0114, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0083, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0087, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0017, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0154, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0111, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0135, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0090, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0009, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0023, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0137, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0103, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0103, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0081, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0089, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0097, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0090, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0123, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0092, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0096, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0132, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0086, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0166, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0091, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0094, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0082, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0129, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0107, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0134, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0013, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0117, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0111, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0019, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0080, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0087, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0084, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0101, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0105, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0146, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0134, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0106, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0092, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0082, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0099, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0091, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0098, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0012, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0009, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0112, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0113, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0013, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0138, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0084, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0087, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0082, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0091, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0145, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0137, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0010, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0095, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0099, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0091, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0113, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0017, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0081, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0178, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0080, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0022, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0100, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0142, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0138, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0106, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0087, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0084, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0127, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0098, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0093, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0021, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0121, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0090, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0119, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0101, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0116, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0098, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0080, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0102, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0024, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0085, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0083, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0109, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0092, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0019, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0085, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0087, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0014, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0089, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0024, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0122, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0134, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0014, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0150, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0090, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0082, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0088, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0009, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0150, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0137, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0094, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0198, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0102, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0005, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0103, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0023, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0088, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0023, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0128, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0103, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0138, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0081, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0114, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0099, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0098, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0105, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0103, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0086, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0102, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0101, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0100, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0089, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0007, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0024, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0140, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0095, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0124, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0102, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0118, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0187, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0097, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0092, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0130, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0102, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0011, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0021, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0124, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0096, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0104, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0088, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0088, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0020, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0019, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0110, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0103, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0095, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0015, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0126, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0134, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0119, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0085, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0023, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0092, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0019, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0085, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0117, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0114, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0017, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0084, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0101, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0116, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0104, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0083, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0024, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0144, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0088, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0108, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0024, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0080, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0011, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0100, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0092, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0113, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0088, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0088, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0017, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0099, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0080, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0103, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0107, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0110, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0160, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0091, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0112, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0020, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0097, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0139, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0087, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0087, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0123, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0147, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0089, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0083, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0101, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0091, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0084, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0096, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0081, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0015, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0103, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0126, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0008, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0024, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0104, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0122, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0097, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0080, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0109, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0008, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0099, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0021, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0130, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0119, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0118, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0090, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0127, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0101, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0096, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0080, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0125, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0091, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0018, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0118, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0096, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0081, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0082, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0113, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0107, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0162, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0166, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0098, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0156, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0086, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0089, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0115, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0107, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0135, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0080, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0099, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0137, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0122, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0122, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0097, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0091, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0084, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0126, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0081, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0149, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0127, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0122, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0155, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0101, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0099, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0019, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0133, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0103, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0099, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0086, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0085, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0106, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0101, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0124, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0135, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0114, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0106, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0110, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0086, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0023, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0112, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0108, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0114, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0081, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0112, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0097, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0086, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0115, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0099, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0121, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0106, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0017, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0147, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0094, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0018, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0091, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0081, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0127, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0097, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0018, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0120, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0084, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0107, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0126, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0090, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0091, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0108, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0101, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0091, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0084, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0080, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0020, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0156, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0087, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0168, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0086, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0082, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0022, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0125, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0101, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0090, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0084, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0103, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0120, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0081, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0091, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0023, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0081, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0102, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0020, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0093, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0100, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0081, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0020, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0090, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0131, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0103, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0211, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0111, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0081, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0104, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0090, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0098, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0095, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0019, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0122, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0085, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0087, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0109, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0101, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0016, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0086, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0102, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0122, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0086, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0084, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0152, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0081, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0092, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0098, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0110, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0115, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0123, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0124, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0119, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0089, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0094, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0102, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0024, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0109, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0113, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0080, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0092, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0085, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0107, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0132, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0092, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0125, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0104, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0014, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0109, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0116, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0080, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0116, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0095, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0134, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0093, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0095, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0105, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0236, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0130, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0022, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0018, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0110, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0023, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0093, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0106, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0106, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0130, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0093, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0094, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0019, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0167, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0019, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0163, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0119, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0217, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0099, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0126, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0103, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0095, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0019, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0018, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0083, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0116, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0101, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0089, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0102, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0112, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0115, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0128, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0111, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0094, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0080, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0105, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0111, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0111, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0081, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0023, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0081, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0133, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0083, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0009, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0118, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0082, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0090, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0019, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0120, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0020, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0011, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0088, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0126, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0111, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0098, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0120, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0152, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0081, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0114, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0082, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0021, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0123, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0023, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0124, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0017, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0020, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0097, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0092, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0021, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0138, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0102, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0100, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0102, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0023, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0105, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0135, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0097, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0023, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0113, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0095, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0122, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0017, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0080, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0098, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0111, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0020, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0022, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0089, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0085, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0080, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0126, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0084, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0019, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0022, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0022, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0087, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0009, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0022, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0021, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0084, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0023, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0142, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0090, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0143, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0024, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0181, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0107, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0082, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0099, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0021, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0124, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0116, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0092, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0170, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0123, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0107, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0021, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0105, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0097, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0116, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0154, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0104, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0119, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0092, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0100, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0021, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0080, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0086, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0117, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0118, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0081, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0099, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0115, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0096, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0212, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0098, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0122, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0123, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0085, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0008, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0091, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0084, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0088, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0139, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0094, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0094, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0081, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0107, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0020, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0019, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0081, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0081, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0091, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0086, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0090, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0133, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0089, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0023, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0021, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0157, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0087, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0091, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0102, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0085, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0083, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0023, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0110, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0121, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0113, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0085, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0024, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0110, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0098, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0010, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0023, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0089, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0093, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0169, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0084, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0083, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0013, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0129, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0085, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0124, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0090, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0111, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0016, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0093, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0177, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0087, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0081, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0110, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0124, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0102, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0090, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0086, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0131, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0016, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0100, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0125, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0083, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0178, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0087, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0023, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0086, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0151, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0106, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0021, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0143, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0141, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0021, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0084, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0018, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0088, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0016, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0011, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0131, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0109, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0080, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0094, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0022, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0085, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0082, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0098, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0111, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0146, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0127, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0086, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0017, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0086, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0015, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0098, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0112, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0089, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0110, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0085, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0024, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0101, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0080, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0101, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0086, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0099, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0100, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0024, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0092, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0094, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0102, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0018, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0096, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0130, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0143, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0130, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0126, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0092, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0107, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0108, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0099, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0115, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0091, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0020, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0014, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0101, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0089, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0017, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0123, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0010, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0083, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0099, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0097, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0103, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0021, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0086, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0082, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0085, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0022, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0080, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0115, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0124, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0113, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0125, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0164, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0159, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0024, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0117, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0109, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0082, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0081, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0099, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0083, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0091, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0111, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0087, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0085, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0091, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0093, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0105, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0099, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0022, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0015, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0092, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0083, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0090, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0019, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0107, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0095, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0119, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0087, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0086, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0080, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0084, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0099, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0150, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0118, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0163, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0088, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0093, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0015, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0115, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0126, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0095, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0122, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0084, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0095, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0086, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0018, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0105, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0116, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0140, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0182, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0011, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0119, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0138, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0013, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0117, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0119, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0121, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0102, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0085, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0100, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0096, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0115, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0092, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0093, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0099, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0176, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0161, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0102, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0122, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0024, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0089, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0110, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0087, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0080, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0112, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0114, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0024, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0023, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0177, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0090, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0144, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0083, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0081, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0119, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0112, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0134, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0177, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0097, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0093, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0098, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0135, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0147, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0085, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0108, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0099, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0020, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0103, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0082, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0095, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0089, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0023, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0019, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0131, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0019, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0100, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0082, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0105, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0148, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0081, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0130, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0088, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0090, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0095, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0097, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0091, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0020, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0006, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0024, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0083, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0081, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0014, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0081, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0107, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0110, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0083, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0101, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0084, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0135, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0109, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0014, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0091, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0099, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0082, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0126, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0136, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0105, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0082, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0082, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0141, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0086, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0096, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0110, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0086, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0012, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0084, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0083, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0091, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0094, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0023, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0084, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0149, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0114, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0105, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0121, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0085, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0023, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0107, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0130, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0021, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0024, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0084, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0084, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0081, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0096, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0024, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0014, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0090, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0091, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0082, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0081, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0141, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0012, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0089, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0117, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0023, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0018, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0106, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0127, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0019, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0100, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0084, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0101, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0010, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0080, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0151, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0012, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0115, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0105, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0020, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0084, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0093, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0159, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0015, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0020, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0144, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0109, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0018, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0024, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0015, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0117, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0094, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0128, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0082, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0091, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0134, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0088, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0096, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0111, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0089, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0084, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0015, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0018, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0082, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0115, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0094, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0098, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0080, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0013, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0016, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0024, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0090, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0106, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0097, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0012, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0082, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0092, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0089, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0090, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0088, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0082, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0097, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0104, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0117, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0085, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0085, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0095, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0019, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0106, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0023, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0082, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0013, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0081, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0127, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0114, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0208, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0007, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0023, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0017, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0094, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0024, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0097, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0118, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0099, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0018, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0133, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0022, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0184, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0147, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0008, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0126, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0127, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0084, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0092, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0099, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0088, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0020, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0166, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0100, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0019, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0022, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0086, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0101, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0120, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0105, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0121, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0082, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0020, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0016, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0016, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0019, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0082, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0083, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0120, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0089, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0080, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0020, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0103, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0110, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0103, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0110, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0082, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0171, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0021, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0023, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0021, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0093, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0019, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0019, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0086, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0008, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0021, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0015, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0087, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0091, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0080, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0115, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0024, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0087, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0083, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0099, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0006, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0022, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0098, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0088, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0010, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0013, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0124, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0115, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0136, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0098, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0094, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0090, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0097, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0085, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0128, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0127, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0006, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0119, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0084, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0013, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0014, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0024, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0127, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0094, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0120, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0021, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0086, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0082, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0145, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0103, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0083, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0017, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0017, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0150, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0157, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0081, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0110, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0017, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0086, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0096, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0084, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0018, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0096, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0140, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0005, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0084, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0121, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0016, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0019, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0017, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0118, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0118, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0107, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0112, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0013, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0167, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0082, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0015, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0093, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0226, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0091, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0098, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0086, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0108, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0097, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0109, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0012, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0017, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0086, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0086, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0095, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0098, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0093, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0155, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0014, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0081, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0023, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0014, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0080, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0100, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0021, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0125, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0086, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0080, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0127, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0020, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0087, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0122, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0085, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0118, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0080, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0081, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0086, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0015, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0133, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0089, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0023, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0019, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0099, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0242, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0128, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0024, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0118, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0231, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0108, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0120, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0085, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0087, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0023, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0131, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0114, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0093, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0118, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0019, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0024, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0122, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0021, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0014, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0082, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0013, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0087, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0097, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0111, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0089, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0010, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0019, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0104, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0088, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0080, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0103, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0083, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0017, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0085, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0019, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0127, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0019, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0115, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0099, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0089, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0180, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0116, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0018, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0092, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0123, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0132, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0023, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0149, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0094, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0024, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0106, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0100, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0118, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0156, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0015, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0081, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0118, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0007, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0123, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0111, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0014, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0100, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0130, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0099, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0124, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0111, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0116, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0098, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0024, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0103, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0021, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0145, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0122, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0084, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0120, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0081, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0020, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0021, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0180, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0090, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0098, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0088, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0105, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0022, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0102, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0134, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0110, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0150, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0098, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0086, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0089, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0117, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0011, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0022, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0017, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0023, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0119, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0021, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0177, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0117, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0107, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0161, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0181, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0011, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0013, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0192, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0080, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0104, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0002, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0082, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0023, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0019, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0006, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0083, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0118, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0087, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0014, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0107, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0118, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0136, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0020, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0116, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0114, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0006, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0016, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0120, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0015, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0083, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0099, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0021, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0008, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0082, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0119, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0089, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0087, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0080, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0129, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0107, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0087, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0014, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0022, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0095, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0118, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0104, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0019, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0083, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0008, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0082, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0080, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0097, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0023, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0089, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0082, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0084, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0019, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0120, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0015, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0024, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0009, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0115, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0080, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0012, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0091, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0122, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0017, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0094, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0107, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0086, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0099, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0083, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0017, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0087, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0024, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0107, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0082, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0109, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0015, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0096, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0205, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0020, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0111, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0021, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0107, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0081, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0087, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0091, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0102, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0161, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0088, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0024, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0014, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0084, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0089, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0021, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0018, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0017, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0082, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0024, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0012, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0015, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0012, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0162, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0089, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0160, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0139, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0015, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0114, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0019, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0083, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0086, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0084, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0024, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0142, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0091, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0134, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0150, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0120, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0091, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0158, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0022, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0092, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0096, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0126, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0098, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0082, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0190, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0008, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0102, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0121, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0084, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0112, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0080, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0121, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0013, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0163, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0195, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0016, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0023, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0022, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0087, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0022, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0125, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0090, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0107, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0105, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0094, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0084, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0092, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0118, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0120, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0092, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0011, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0020, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0119, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0098, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0111, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0111, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0114, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0022, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0087, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0018, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0018, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0016, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0100, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0115, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0107, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0112, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0117, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0024, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0080, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0121, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0084, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0082, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0019, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0142, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0020, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0023, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0081, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0111, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0103, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0023, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0022, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0016, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0091, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0112, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0022, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0096, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0015, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0089, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0100, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0096, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0010, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0020, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0110, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0090, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0012, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0012, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0091, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0100, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0018, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0014, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0115, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0115, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0016, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0089, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0100, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0176, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0097, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0151, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0015, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0081, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0091, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0085, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0017, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0150, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0022, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0009, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0092, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0145, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0014, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0022, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0013, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0102, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0023, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0081, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0096, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0136, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0092, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0116, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0018, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0095, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0106, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0185, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0092, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0098, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0104, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0023, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0019, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0093, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0114, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0021, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0100, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0018, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0090, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0081, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0145, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0091, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0011, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0023, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0121, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0020, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0024, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0109, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0011, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0021, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0023, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0084, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0023, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0007, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0112, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0016, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0086, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0109, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0023, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0092, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0097, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0084, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0105, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0144, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0114, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0096, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0022, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0023, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0114, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0081, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0112, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0119, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0021, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0024, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0094, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0114, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0101, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0022, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0021, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0103, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0084, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0021, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0091, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0081, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0020, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0081, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0019, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0106, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0007, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0111, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0007, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0018, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0019, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0083, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0020, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0020, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0169, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0096, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0086, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0082, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0019, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0090, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0107, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0137, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0086, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0023, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0021, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0021, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0089, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0014, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0122, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0114, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0015, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0163, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0012, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0082, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0019, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0085, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0119, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0012, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0092, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0085, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0018, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0108, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0087, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0105, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0100, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0100, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0081, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0083, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0089, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0143, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0090, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0132, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0109, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0119, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0106, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0120, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0015, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0109, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0121, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0018, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0011, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0095, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0081, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0087, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0122, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0097, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0021, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0114, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0121, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0024, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0020, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0022, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0094, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0103, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0122, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0098, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0014, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0126, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0152, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0104, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0080, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0022, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0116, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0012, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0007, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0103, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0179, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0093, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0110, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0104, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0090, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0019, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0019, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0104, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0022, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0008, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0104, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0087, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0014, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0109, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0016, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0089, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0109, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0096, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0097, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0099, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0012, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0123, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0125, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0089, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0156, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0083, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0019, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0083, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0097, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0091, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0088, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0105, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0167, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0021, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0018, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0122, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0108, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0084, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0083, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0099, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0086, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0086, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0088, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0085, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0092, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0089, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0176, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0023, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0152, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0092, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0108, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0020, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0023, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0130, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0024, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0104, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0110, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0082, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0085, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0105, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0112, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0023, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0016, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0117, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0022, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0023, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0109, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0180, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0129, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0089, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0107, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0100, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0110, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0125, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0090, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0097, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0080, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0099, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0121, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0202, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0115, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0022, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0080, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0120, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0091, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0081, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0114, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0093, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0139, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0112, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0099, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0120, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0090, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0126, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0087, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0123, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0088, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0097, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0114, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0120, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0096, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0014, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0082, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0107, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0020, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0099, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0110, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0121, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0111, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0127, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0103, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0103, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0089, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0017, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0080, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0099, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0085, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0011, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0089, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0007, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0083, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0095, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0020, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0109, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0020, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0100, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0126, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0092, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0141, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0082, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0017, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0121, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0086, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0011, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0020, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0023, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0144, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0097, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0095, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0088, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0094, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0100, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0098, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0088, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0092, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0101, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0019, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0127, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0083, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0082, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0114, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0099, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0096, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0124, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0099, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0088, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0090, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0121, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0019, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0109, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0099, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0095, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0085, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0083, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0098, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0024, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0107, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0099, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0080, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0080, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0020, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0018, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0017, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0125, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0105, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0101, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0020, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0084, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0018, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0088, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0113, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0080, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0090, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0107, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0105, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0136, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0114, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0091, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0096, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0021, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0023, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0021, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0145, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0016, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0084, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0095, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0091, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0013, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0123, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0013, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0111, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0081, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0124, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0084, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0093, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0081, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0016, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0128, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0082, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0110, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0083, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0137, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0131, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0090, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0016, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0083, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0020, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0118, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0090, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0083, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0090, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0154, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0082, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0016, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0092, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0123, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0102, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0086, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0131, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0018, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0090, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0099, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0094, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0113, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0086, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0105, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0081, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0110, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0023, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0087, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0092, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0090, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0121, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0122, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0091, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0113, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0017, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0095, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0119, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0117, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0101, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0101, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0135, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0019, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0142, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0085, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0124, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0023, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0089, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0113, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0121, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0018, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0084, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0092, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0088, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0104, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0128, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0021, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0087, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0098, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0115, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0081, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0158, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0109, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0112, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0081, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0090, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0148, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0112, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0022, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0019, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0086, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0091, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0023, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0082, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0091, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0088, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0017, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0009, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0017, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0017, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0120, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0085, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0104, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0017, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0104, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0109, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0082, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0017, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0107, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0106, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0085, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0126, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0155, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0090, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0092, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0105, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0107, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0087, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0022, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0083, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0089, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0020, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0116, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0091, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0100, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0019, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0123, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0091, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0099, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0171, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0084, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0007, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0016, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0022, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0010, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0115, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0100, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0083, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0097, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0103, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0105, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0157, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0083, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0163, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0119, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0016, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0082, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0087, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0088, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0104, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0197, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0082, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0091, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0086, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0112, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0144, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0138, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0083, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0087, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0112, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0152, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0088, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0083, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0013, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0138, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0138, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0091, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0093, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0089, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0112, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0093, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0130, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0101, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0101, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0017, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0023, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0136, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0081, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0153, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0113, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0130, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0081, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0018, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0092, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0015, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0082, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0081, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0024, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0118, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0099, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0013, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0113, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0103, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0128, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0128, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0106, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0080, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0088, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0084, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0099, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0014, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0017, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0089, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0012, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0020, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0018, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0096, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0088, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0106, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0146, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0097, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0016, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0114, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0088, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0102, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0100, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0120, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0082, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0084, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0099, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0157, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0128, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0115, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0090, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0020, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0024, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0013, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0083, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0086, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0095, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0080, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0084, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0019, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0085, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0103, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0156, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0080, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0085, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0102, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0099, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0099, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0024, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0011, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0106, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0085, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0019, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0111, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0024, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0094, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0135, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0084, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0098, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0089, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0088, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0091, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0134, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0178, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0024, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0024, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0013, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0166, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0114, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0150, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0098, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0125, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0108, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0024, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0080, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0083, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0093, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0111, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0023, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0114, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0140, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0107, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0114, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0088, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0019, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0126, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0105, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0018, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0083, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0100, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0087, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0020, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0014, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0009, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0094, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0088, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0143, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0109, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0021, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0022, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0084, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0096, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0023, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0080, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0083, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0109, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0080, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0117, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0018, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0016, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0108, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0087, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0091, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0103, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0088, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0013, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0014, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0014, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0160, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0022, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0114, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0024, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0116, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0084, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0020, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0088, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0188, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0100, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0096, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0088, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0130, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0142, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0081, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0087, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0093, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0119, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0023, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0081, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0121, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0080, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0018, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0086, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0084, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0105, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0088, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0089, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0085, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0118, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0103, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0097, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0089, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0099, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0101, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0101, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0148, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0106, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0089, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0099, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0019, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0010, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0092, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0081, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0139, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0106, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0154, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0097, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0023, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0024, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0022, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0121, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0096, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0094, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0131, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0119, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0022, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0089, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0104, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0087, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0098, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0019, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0021, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0100, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0127, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0103, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0081, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0011, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0022, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0012, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0101, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0008, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0088, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0107, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0121, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0111, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0021, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0020, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0024, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0091, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0109, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0024, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0084, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0080, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0097, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0015, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0012, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0018, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0119, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0082, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0088, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0021, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0083, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0154, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0093, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0085, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0086, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0016, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0082, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0011, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0015, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0022, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0141, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0121, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0016, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0014, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0011, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0085, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0022, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0093, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0015, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0013, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0086, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0096, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0120, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0019, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0101, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0106, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0017, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0021, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0087, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0022, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0137, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0092, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0110, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0020, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0022, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0107, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0084, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0096, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0088, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0018, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0088, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0093, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0092, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0138, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0094, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0015, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0085, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0101, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0091, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0092, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0124, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0087, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0006, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0104, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0080, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0096, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0013, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0095, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0021, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0080, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0089, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0113, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0135, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0013, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0105, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0111, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0093, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0142, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0089, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0092, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0090, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0097, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0091, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0111, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0116, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0024, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0019, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0024, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0096, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0018, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0089, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0083, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0019, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0107, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0106, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0082, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0087, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0087, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0096, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0096, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0106, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0103, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0023, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0092, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0018, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0015, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0144, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0010, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0081, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0127, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0100, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0011, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0084, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0113, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0022, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0100, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0018, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0086, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0102, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0085, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0108, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0017, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0104, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0105, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0094, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0022, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0165, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0092, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0144, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0104, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0088, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0146, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0085, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0024, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0021, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0111, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0122, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0095, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0084, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0084, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0115, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0082, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0084, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0092, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0090, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0100, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0023, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0084, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0123, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0011, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0021, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0023, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0020, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0023, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0016, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0022, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0008, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0009, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0163, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0114, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0195, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0023, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0084, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0019, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0012, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0021, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0083, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0016, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0084, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0143, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0023, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0080, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0129, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0020, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0083, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0022, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0018, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0021, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0016, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0125, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0081, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0111, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0080, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0083, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0024, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0023, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0091, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0087, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0090, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0083, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0087, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0021, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0094, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0021, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0081, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0099, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0105, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0022, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0097, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0013, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0083, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0022, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0106, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0024, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0018, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0120, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0089, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0112, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0022, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0101, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0090, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0014, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0019, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0120, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0019, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0092, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0103, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0124, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0023, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0019, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0137, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0148, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0080, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0081, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0087, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0014, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0087, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0024, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0083, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0014, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0162, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0089, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0024, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0013, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0099, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0083, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0017, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0094, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0093, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0183, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0109, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0100, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0020, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0104, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0106, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0087, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0117, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0107, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0018, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0020, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0101, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0080, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0116, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0083, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0088, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0020, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0024, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0088, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0087, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0018, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0102, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0022, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0132, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0105, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0024, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0109, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0086, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0134, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0011, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0080, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0130, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0106, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0093, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0021, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0086, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0019, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0088, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0081, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0009, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0087, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0082, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0173, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0096, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0117, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0105, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0024, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0088, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0093, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0082, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0090, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0089, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0018, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0021, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0153, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0114, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0121, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0091, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0081, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0017, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0156, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0094, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0019, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0097, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0088, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0022, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0023, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0020, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0086, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0100, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0089, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0101, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0110, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0119, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0126, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0090, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0136, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0100, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0023, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0087, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0094, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0088, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0102, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0017, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0085, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0090, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0117, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0146, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0100, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0024, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0092, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0145, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0022, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0090, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0021, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0110, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0021, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0081, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0095, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0093, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0100, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0126, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0102, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0024, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0020, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0116, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0093, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0095, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0018, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0085, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0137, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0014, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0080, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0012, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0163, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0109, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0116, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0019, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0101, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0020, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0083, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0008, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0090, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0013, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0022, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0005, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0018, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0131, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0023, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0105, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0091, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0097, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0082, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0083, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0109, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0085, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0083, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0094, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0090, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0103, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0104, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0080, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0091, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0094, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0081, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0095, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0105, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0104, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0024, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0019, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0092, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0098, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0134, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0109, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0111, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0024, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0095, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0086, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0106, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0127, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0096, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0085, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0089, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0024, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0117, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0096, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0101, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0090, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0023, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0105, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0105, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0024, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0118, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0114, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0125, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0123, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0018, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0085, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0116, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0091, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0089, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0095, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0083, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0084, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0109, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0081, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0114, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0021, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0080, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0021, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0016, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0081, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0085, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0117, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0017, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0022, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0089, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0128, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0101, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0095, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0014, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0016, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0101, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0097, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0091, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0024, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0015, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0024, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0098, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0083, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0127, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0098, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0104, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0096, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0085, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0019, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0023, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0024, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0081, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0019, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0094, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0019, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0024, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0080, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0089, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0081, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0011, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0113, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0100, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0085, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0090, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0016, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0121, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0084, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0013, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0021, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0018, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0098, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0119, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0088, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0093, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0019, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0088, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0087, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0015, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0086, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0106, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0083, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0083, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0020, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0105, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0023, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0014, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0102, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0131, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0096, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0024, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0099, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0024, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0115, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0008, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0096, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0109, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0135, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0100, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0024, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0097, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0095, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0111, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0163, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0102, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0115, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0125, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0091, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0098, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0019, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0024, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0091, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0142, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0084, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0142, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0133, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0008, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0082, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0106, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0097, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0103, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0024, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0015, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0022, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0015, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0018, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0016, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0108, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0086, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0017, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0018, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0015, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0016, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0092, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0108, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0086, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0093, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0024, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0019, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0101, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0097, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0083, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0020, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0017, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0133, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0081, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0082, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0018, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0013, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0133, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0016, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0123, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0096, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0104, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0133, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0090, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0012, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0093, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0083, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0021, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0080, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0094, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0123, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0018, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0012, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0086, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0106, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0169, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0101, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0082, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0101, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0114, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0022, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0009, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0110, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0087, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0014, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0007, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0093, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0087, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0023, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0086, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0093, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0119, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0101, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0021, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0108, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0018, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0006, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0016, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0019, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0087, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0020, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0024, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0021, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0092, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0090, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0094, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0085, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0109, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0111, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0080, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0097, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0088, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0099, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0080, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0115, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0084, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0113, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0090, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0145, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0081, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0010, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0148, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0024, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0081, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0018, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0104, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0169, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0113, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0080, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0141, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0092, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0094, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0097, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0021, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0089, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0018, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0088, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0007, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0019, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0022, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0012, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0086, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0090, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0132, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0017, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0022, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0023, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0082, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0091, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0119, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0092, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0112, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0084, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0118, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0084, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0024, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0096, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0132, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0095, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0014, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0095, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0096, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0021, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0086, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0087, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0143, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0021, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0023, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0109, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0086, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0023, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0008, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0016, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0081, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0117, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0090, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0101, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0081, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0091, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0085, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0021, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0080, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0097, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0114, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0011, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0088, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0133, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0089, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0152, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0163, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0092, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0084, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0023, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0020, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0119, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0080, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0105, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0101, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0120, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0132, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0097, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0090, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0083, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0089, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0019, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0086, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0024, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0094, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0082, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0093, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0020, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0081, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0115, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0082, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0016, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0108, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0082, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0087, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0101, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0122, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0127, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0125, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0136, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0089, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0014, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0017, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0097, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0117, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0080, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0019, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0100, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0024, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0106, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0117, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0106, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0102, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0088, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0023, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0088, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0095, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0104, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0086, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0091, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0084, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0193, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0096, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0115, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0085, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0087, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0081, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0021, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0116, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0024, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0089, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0103, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0131, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0010, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0014, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0084, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0080, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0122, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0094, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0094, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0080, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0088, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0088, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0110, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0107, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0092, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0081, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0090, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0107, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0140, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0081, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0082, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0094, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0020, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0023, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0084, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0086, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0088, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0088, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0120, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0121, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0021, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0096, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0118, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0092, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0019, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0009, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0105, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0126, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0084, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0113, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0014, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0096, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0089, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0024, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0096, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0106, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0094, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0080, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0093, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0081, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0090, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0145, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0091, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0095, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0091, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0083, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0088, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0196, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0088, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0099, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0094, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0024, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0114, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0083, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0104, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0021, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0018, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0095, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0013, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0080, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0116, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0101, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0092, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0124, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0087, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0092, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0148, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0136, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0080, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0085, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0089, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0112, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0168, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0083, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0082, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0013, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0009, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0016, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0011, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0019, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0117, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0092, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0110, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0101, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0022, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0015, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0086, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0100, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0093, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0098, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0021, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0020, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0111, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0020, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0095, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0080, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0119, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0080, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0015, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0018, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0086, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0118, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0086, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0080, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0094, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0018, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0005, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0097, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0010, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0152, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0014, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0133, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0092, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0114, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0096, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0024, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0093, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0083, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0124, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0014, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0081, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0015, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0172, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0024, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0017, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0097, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0094, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0091, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0088, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0094, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0018, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0110, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0019, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0096, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0120, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0093, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0020, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0108, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0122, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0018, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0114, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0018, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0091, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0008, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0019, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0122, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0094, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0092, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0125, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0019, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0120, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0115, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0017, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0128, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0083, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0104, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0090, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0090, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0095, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0090, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0024, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0137, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0120, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0106, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0096, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0024, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0084, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0087, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0163, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0103, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0102, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0018, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0006, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0082, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0099, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0097, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0098, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0102, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0115, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0086, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0099, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0015, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0132, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0021, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0015, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0081, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0113, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0086, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0011, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0007, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0090, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0081, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0083, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0090, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0019, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0084, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0095, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0084, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0085, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0019, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0125, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0022, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0115, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0146, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0018, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0087, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0015, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0091, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0094, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0133, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0012, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0017, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0100, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0108, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0132, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0017, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0088, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0131, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0014, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0021, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0088, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0098, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0005, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0096, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0103, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0117, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0091, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0080, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0082, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0085, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0096, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0087, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0089, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0110, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0085, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0011, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0102, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0020, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0105, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0087, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0020, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0097, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0091, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0113, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0133, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0092, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0080, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0024, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0093, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0086, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0146, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0140, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0117, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0099, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0017, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0112, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0080, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0134, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0089, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0081, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0081, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0087, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0132, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0156, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0083, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0099, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0098, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0085, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0081, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0087, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0084, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0013, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0108, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0102, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0081, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0080, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0111, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0013, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0082, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0097, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0099, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0023, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0090, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0013, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0023, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0086, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0086, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0112, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0094, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0018, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0083, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0121, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0020, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0091, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0100, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0102, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0126, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0130, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0020, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0017, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0093, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0080, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0085, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0124, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0102, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0090, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0093, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0106, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0102, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0131, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0021, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0024, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0087, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0093, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0011, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0090, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0012, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0020, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0092, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0009, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0022, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0105, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0024, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0102, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0093, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0092, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0091, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0015, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0081, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0136, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0093, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0091, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0098, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0095, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0122, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0015, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0019, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0093, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0024, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0016, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0093, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0085, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0096, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0086, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0012, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0082, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0011, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0136, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0093, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0175, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0138, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0097, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0023, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0087, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0024, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0105, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0091, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0023, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0136, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0024, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0119, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0101, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0024, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0017, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0018, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0082, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0131, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0086, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0012, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0018, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0092, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0086, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0015, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0082, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0099, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0089, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0018, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0097, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0006, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0084, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0087, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0022, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0097, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0021, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0090, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0130, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0154, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0024, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0107, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0090, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0083, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0024, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0022, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0124, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0022, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0097, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0015, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0089, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0091, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0082, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0083, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0105, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0019, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0012, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0019, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0103, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0092, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0082, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0083, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0017, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0103, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0014, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0022, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0015, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0099, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0082, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0085, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0100, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0105, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0092, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0008, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0121, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0091, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0018, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0016, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0083, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0021, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0020, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0024, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0108, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0118, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0092, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0023, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0019, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0021, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0142, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0108, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0019, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0024, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0086, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0013, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0157, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0018, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0099, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0082, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0103, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0086, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0125, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0081, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0081, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0021, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0087, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0111, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0093, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0086, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0086, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0018, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0094, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0104, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0024, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0120, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0024, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0080, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0022, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0151, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0081, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0082, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0085, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0124, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0023, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0104, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0020, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0083, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0083, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0133, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0126, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0024, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0017, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0018, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0089, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0080, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0093, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0098, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0116, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0016, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0089, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0018, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0014, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0104, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0104, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0011, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0092, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0021, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0016, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0013, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0023, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0016, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0085, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0020, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0103, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0087, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0014, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0022, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0085, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0106, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0097, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0019, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0111, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0090, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0019, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0104, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0083, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0088, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0018, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0147, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0102, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0084, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0015, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0085, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0095, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0017, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0122, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0080, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0083, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0023, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0080, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0096, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0015, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0110, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0022, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0089, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0020, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0101, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0024, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0010, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0024, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0012, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0013, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0085, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0019, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0085, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0103, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0017, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0091, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0023, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0024, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0023, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0111, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0081, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0166, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0086, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0083, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0014, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0022, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0023, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0099, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0089, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0152, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0108, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0022, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0017, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0095, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0017, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0021, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0016, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0093, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0092, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0012, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0153, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0098, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0024, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0021, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0024, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0141, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0081, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0015, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0018, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0021, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0082, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0087, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0085, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0024, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0084, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0122, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0109, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0022, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0091, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0093, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0087, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0123, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0024, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0022, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0011, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0118, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0005, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0017, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0088, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0083, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0021, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0100, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0105, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0019, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0090, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0097, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0017, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0084, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0091, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0112, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0094, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0093, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0024, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0088, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0004, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0089, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0011, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0103, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0006, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0122, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0082, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0019, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0019, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0015, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0016, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0018, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0023, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0085, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0084, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0100, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0087, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0014, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0020, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0098, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0085, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0096, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0091, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0091, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0024, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0090, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0100, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0083, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0082, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0023, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0127, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0105, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0107, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0136, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0020, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0084, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0091, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0082, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0081, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0018, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0103, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0118, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0024, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0097, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0100, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0119, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0088, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0084, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0004, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0021, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0098, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0019, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0122, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0091, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0016, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0080, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0093, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0099, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0082, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0013, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0012, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0085, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0091, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0124, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0087, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0095, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0019, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0019, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0014, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0021, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0093, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0018, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0012, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0112, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0082, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0007, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0138, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0106, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0081, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0094, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0012, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0021, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0091, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0087, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0016, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0021, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0020, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0024, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0092, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0087, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0113, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0016, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0121, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0088, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0087, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0020, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0011, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0091, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0014, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0024, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0103, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0021, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0017, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0023, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0137, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0128, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0021, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0088, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0013, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0017, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0011, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0094, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0011, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0018, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0096, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0108, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0021, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0021, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0002, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0002, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0002, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0002, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0002, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0002, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0002, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0002, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0002, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0002, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0002, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0002, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0002, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0002, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0002, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0002, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0002, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0002, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0005, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0023, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0113, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0017, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0086, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0106, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0011, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0082, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0085, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0089, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0024, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0092, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0090, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0023, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0023, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0098, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0088, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0022, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0090, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0095, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0023, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0023, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0021, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0089, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0017, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0084, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0007, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0014, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0104, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0022, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0089, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0082, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0019, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0086, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0024, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0090, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0083, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0081, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0085, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0086, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0018, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0021, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0093, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0019, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0024, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0088, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0080, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0019, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0109, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0096, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0012, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0111, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0098, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0087, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0133, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0016, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0014, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0023, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0020, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0085, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0100, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0091, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0024, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0020, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0009, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0150, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0123, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0141, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0082, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0015, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0105, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0016, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0023, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0151, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0099, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0081, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0022, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0005, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0017, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0177, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0122, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0011, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0086, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0087, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0019, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0119, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0012, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0113, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0087, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0021, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0020, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0099, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0022, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0017, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0099, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0022, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0014, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0020, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0020, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0016, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0103, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0024, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0017, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0098, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0083, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0021, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0019, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0021, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0021, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0099, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0022, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0024, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0016, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0022, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0092, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0085, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0116, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0020, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0114, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0081, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0022, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0012, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0021, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0082, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0024, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0095, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0082, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0021, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0017, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0083, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0020, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0136, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0090, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0017, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0082, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0021, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0084, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0099, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0007, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0010, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0081, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0017, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0018, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0016, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0024, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0081, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0093, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0020, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0111, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0009, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0022, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0095, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0023, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0014, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0101, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0020, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0018, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0146, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0081, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0014, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0089, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0023, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0171, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0023, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0088, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0017, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0010, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0104, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0087, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0023, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0023, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0153, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0005, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0082, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0106, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0094, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0164, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0024, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0089, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0093, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0024, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0023, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0083, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0014, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0081, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0017, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0096, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0094, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0101, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0016, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0085, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0092, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0089, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0017, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0098, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0005, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0081, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0011, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0009, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0091, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0023, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0019, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0083, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0024, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0020, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0023, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0021, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0016, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0145, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0015, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0024, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0024, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0102, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0024, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0080, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0097, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0014, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0018, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0105, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0007, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0019, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0115, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0090, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0020, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0020, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0082, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0019, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0100, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0024, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0007, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0020, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0024, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0087, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0017, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0109, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0016, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0082, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0095, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0105, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0015, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0083, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0024, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0023, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0137, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0007, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0086, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0020, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0008, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0022, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0083, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0094, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0021, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0091, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0017, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0023, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0022, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0018, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0011, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0090, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0017, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0098, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0011, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0101, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0021, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0022, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0022, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0089, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0080, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0023, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0082, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0024, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0008, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0012, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0012, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0155, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0015, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0006, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0019, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0121, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0080, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0086, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0020, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0024, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0083, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0091, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0010, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0020, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0020, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0008, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0014, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0090, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0018, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0085, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0086, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0021, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0080, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0085, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0022, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0096, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0010, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0022, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0021, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0110, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0022, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0095, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0095, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0020, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0021, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0111, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0106, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0022, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0095, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0105, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0017, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0142, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0022, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0012, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0010, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0012, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0097, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0101, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0024, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0103, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0095, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0080, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0023, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0100, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0086, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0090, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0008, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0018, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0092, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0083, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0024, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0084, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0090, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0093, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0022, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0013, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0019, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0015, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0022, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0111, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0085, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0023, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0021, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0019, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0020, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0094, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0016, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0097, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0121, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0024, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0021, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0137, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0090, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0021, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0082, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0023, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0024, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0111, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0088, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0018, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0020, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0022, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0106, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0089, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0014, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0094, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0018, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0097, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0173, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0109, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0082, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0104, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0018, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0015, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0019, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0022, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0023, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0019, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0137, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0081, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0009, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0023, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0113, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0111, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0015, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0018, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0081, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0091, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0098, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0099, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0092, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0022, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0094, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0083, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0017, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0020, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0098, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0023, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0087, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0089, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0014, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0019, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0089, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0023, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0089, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0021, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0011, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0023, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0082, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0106, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0127, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0015, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0023, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0083, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0103, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0121, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0089, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0020, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0082, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0023, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0108, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0130, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0111, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0094, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0023, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0015, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0087, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0093, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0125, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0021, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0014, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0012, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0081, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0169, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0087, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0020, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0015, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0095, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0023, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0023, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0106, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0023, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0023, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0113, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0015, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0109, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0133, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0017, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0010, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0017, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0022, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0181, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0022, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0085, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0099, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0110, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0022, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0096, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0086, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0020, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0178, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0087, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0087, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0024, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0095, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0110, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0021, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0009, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0090, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0017, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0019, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0021, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0110, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0011, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0117, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0024, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0107, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0102, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0083, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0024, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0016, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0081, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0113, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0021, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0088, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0085, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0099, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0011, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0097, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0015, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0010, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0101, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0022, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0094, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0099, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0018, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0082, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0104, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0022, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0080, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0092, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0089, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0096, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0112, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0085, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0124, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0085, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0118, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0023, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0111, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0094, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0152, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0119, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0106, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0113, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0023, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0085, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0012, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0016, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0008, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0090, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0106, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0010, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0160, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0090, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0015, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0115, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0087, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0107, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0148, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0019, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0024, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0156, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0023, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0094, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0018, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0126, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0080, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0017, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0020, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0081, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0090, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0081, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0088, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0084, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0002, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0012, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0088, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0105, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0023, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0098, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0085, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0015, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0080, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0082, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0085, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0017, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0019, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Saving model at iteration: 20000
Loss: tensor(0.0097, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0106, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0020, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0143, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0012, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0018, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0082, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0103, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0021, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0103, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0105, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0016, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0089, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0082, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0018, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0089, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0016, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0102, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0080, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0024, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0019, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0012, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0080, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0096, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0082, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0011, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0010, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0095, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0017, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0016, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0020, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0023, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0091, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0024, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0023, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0022, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0095, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0113, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0020, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0022, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0019, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0092, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0012, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0022, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0017, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0099, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0082, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0019, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0021, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0012, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0090, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0116, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0141, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0113, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0020, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0081, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0022, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0115, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0106, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0006, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0081, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0111, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0022, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0012, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0013, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0024, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0101, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0080, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0011, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0024, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0084, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0090, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0096, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0086, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0082, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0087, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0090, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0087, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0083, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0080, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0130, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0081, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0122, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0023, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0091, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0085, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0092, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0093, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0159, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0089, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0021, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0022, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0106, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0018, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0143, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0085, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0018, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0024, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0021, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0082, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0096, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0082, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0106, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0080, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0091, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0087, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0024, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0088, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0127, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0087, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0098, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0021, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0124, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0140, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0126, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0087, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0088, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0083, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0111, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0111, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0112, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0024, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0129, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0115, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0021, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0083, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0018, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0093, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0017, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0012, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0102, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0084, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0116, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0129, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0082, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0208, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0013, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0083, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0083, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0086, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0019, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0087, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0135, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0087, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0093, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0089, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0084, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0090, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0023, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0088, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0086, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0012, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0018, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0085, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0092, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0023, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0015, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0018, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0008, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0113, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0022, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0082, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0019, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0082, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0085, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0087, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0010, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0081, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0106, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0104, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0085, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0092, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0097, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0089, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0014, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0089, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0021, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0024, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0018, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0093, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0081, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0111, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0092, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0015, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0088, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0082, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0018, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0088, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0016, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0087, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0102, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0118, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0024, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0024, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0102, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0017, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0098, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0147, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0017, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0016, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0102, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0011, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0132, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0110, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0084, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0084, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0018, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0024, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0080, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0086, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0080, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0012, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0102, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0084, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0088, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0010, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0018, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0020, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0082, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0084, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0093, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0018, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0105, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0016, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0020, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0099, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0082, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0096, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0122, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0017, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0024, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0016, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0085, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0023, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0095, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0023, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0092, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0111, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0101, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0088, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0012, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0087, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0017, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0014, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0014, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0089, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0111, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0018, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0081, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0019, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0093, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0086, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0014, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0090, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0092, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0107, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0015, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0022, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0088, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0097, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0081, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0022, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0024, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0018, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0020, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0096, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0018, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0136, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0112, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0098, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0129, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0021, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0093, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0080, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0090, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0007, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0018, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0093, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0020, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0092, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0021, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0022, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0114, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0090, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0101, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0018, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0108, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0016, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0082, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0096, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0084, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0080, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0017, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0013, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0090, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0011, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0094, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0082, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0022, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0090, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0089, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0017, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0139, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0080, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0115, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0023, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0080, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0081, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0015, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0010, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0023, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0011, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0022, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0086, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0108, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0024, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0020, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0006, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0082, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0016, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0121, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0018, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0082, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0013, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0081, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0020, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0103, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0080, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0159, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0022, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0007, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0094, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0088, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0144, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0020, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0023, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0089, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0086, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0132, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0081, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0011, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0011, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0020, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0084, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0128, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0119, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0014, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0021, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0021, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0164, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0019, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0128, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0080, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0092, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0105, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0017, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0017, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0009, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0020, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0020, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0108, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0090, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0112, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0018, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0096, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0117, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0121, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0083, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0011, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0021, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0084, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0085, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0101, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0108, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0102, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0108, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0090, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0096, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0099, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0084, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0005, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0081, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0080, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0023, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0011, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0023, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0084, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0008, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0021, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0004, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0136, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0085, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0132, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0021, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0023, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0102, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0124, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0021, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0097, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0018, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0017, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0091, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0022, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0009, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0081, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0022, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0097, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0085, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0022, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0022, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0024, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0121, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0088, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0087, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0024, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0019, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0019, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0019, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0021, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0096, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0024, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0023, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0097, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0118, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0085, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0021, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0023, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0024, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0114, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0012, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0104, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0020, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0085, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0097, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0080, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0100, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0129, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0094, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0089, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0081, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0006, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0093, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0017, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0095, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0107, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0089, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0107, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0021, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0019, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0112, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0108, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0015, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0118, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0023, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0088, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0021, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0103, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0090, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0020, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0024, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0101, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0022, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0143, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0013, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0014, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0002, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0023, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0008, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0132, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0084, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0015, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0084, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0080, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0119, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0018, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0096, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0084, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0021, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0010, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0080, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0085, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0024, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0020, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0023, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0082, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0089, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0015, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0090, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0096, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0010, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0015, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0083, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0083, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0021, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0010, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0117, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0095, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0011, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0017, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0108, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0080, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0023, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0024, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0019, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0129, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0021, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0011, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0017, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0016, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0105, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0021, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0012, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0017, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0021, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0005, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0018, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0126, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0011, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0092, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0020, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0087, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0021, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0085, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0083, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0088, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0023, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0024, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0090, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0080, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0004, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0088, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0080, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0023, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0082, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0021, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0007, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0088, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0119, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0087, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0097, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0099, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0013, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0019, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0017, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0081, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0113, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0097, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0087, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0095, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0015, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0095, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0098, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0017, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0019, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0129, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0080, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0083, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0081, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0020, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0089, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0020, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0013, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0023, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0086, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0132, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0092, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0114, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0018, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0021, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0021, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0088, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0090, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0082, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0023, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0023, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0080, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0117, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0104, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0012, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0020, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0015, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0015, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0008, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0109, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0106, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0014, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0143, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0020, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0022, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0020, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0091, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0112, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0100, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0008, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0010, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0020, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0018, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0105, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0015, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0016, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0021, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0086, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0091, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0018, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0097, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0019, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0021, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0018, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0006, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0004, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0095, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0097, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0095, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0020, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0083, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0089, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0109, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0110, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0020, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0111, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0017, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0103, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0018, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0080, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0167, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0125, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0093, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0091, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0095, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0120, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0088, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0101, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0147, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0020, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0095, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0143, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0020, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0080, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0113, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0126, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0090, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0167, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0184, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0110, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0087, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0020, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0017, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0015, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0011, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0080, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0088, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0091, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0127, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0018, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0092, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0105, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0101, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0116, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0089, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0020, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0101, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0022, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0023, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0085, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0083, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0117, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0013, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0023, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0102, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0081, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0112, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0103, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0080, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0096, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0082, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0084, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0015, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0082, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0019, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0131, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0097, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0024, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0083, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0015, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0134, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0024, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0108, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0099, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0013, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0017, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0124, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0100, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0090, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0086, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0022, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0086, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0019, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0091, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0089, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0086, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0085, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0146, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0023, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0102, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0096, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0105, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0112, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0024, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0086, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0084, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0022, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0021, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0080, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0095, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0177, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0101, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0024, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0084, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0120, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0095, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0015, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0093, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0124, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0016, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0081, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0107, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0094, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0084, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0132, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0018, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0102, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0012, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0080, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0017, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0023, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0105, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0024, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0011, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0125, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0087, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0021, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0023, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0086, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0022, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0020, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0024, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0012, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0097, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0111, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0081, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0082, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0019, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0109, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0090, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0013, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0022, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0099, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0022, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0021, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0102, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0081, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0103, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0101, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0119, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0020, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0093, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0120, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0085, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0106, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0024, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0085, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0099, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0106, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0115, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0019, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0091, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0100, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0023, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0015, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0086, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0119, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0102, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0086, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0114, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0018, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0080, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0080, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0022, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0091, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0082, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0098, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0102, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0018, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0093, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0144, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0081, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0080, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0015, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0127, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0022, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0086, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0086, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0020, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0106, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0110, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0018, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0087, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0121, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0083, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0015, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0086, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0019, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0019, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0086, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0021, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0090, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0018, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0112, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0023, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0089, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0127, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0082, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0132, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0129, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0094, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0097, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0022, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0020, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0112, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0083, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0086, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0093, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0087, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0021, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0102, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0115, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0094, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0012, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0020, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0015, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0085, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0081, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0088, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0094, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0018, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0090, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0024, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0081, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0130, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0023, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0118, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0089, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0084, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0018, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0021, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0149, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0011, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0150, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0098, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0125, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0084, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0016, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0094, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0021, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0090, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0110, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0086, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0119, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0121, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0106, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0104, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0021, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0111, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0020, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0081, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0087, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0082, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0015, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0102, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0120, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0098, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0013, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0089, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0110, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0080, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0101, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0094, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0022, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0087, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0104, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0098, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0017, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0022, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0101, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0087, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0128, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0021, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0090, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0024, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0015, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0022, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0111, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0122, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0017, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0024, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0100, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0123, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0104, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0092, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0101, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0082, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0133, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0087, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0022, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0141, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0022, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0101, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0104, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0023, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0091, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0013, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0022, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0116, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0113, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0091, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0023, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0022, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0021, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0094, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0094, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0105, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0024, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0017, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0110, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0095, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0014, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0112, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0155, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0080, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0081, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0114, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0087, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0090, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0101, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0013, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0120, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0085, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0023, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0109, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0087, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0016, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0080, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0019, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0083, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0083, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0096, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0095, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0016, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0013, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0084, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0107, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0008, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0141, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0018, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0125, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0137, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0081, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0101, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0101, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0081, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0117, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0023, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0123, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0103, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0107, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0082, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0086, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0085, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0019, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0018, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0082, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0183, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0106, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0099, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0015, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0018, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0093, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0096, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0090, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0086, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0015, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0090, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0152, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0013, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0087, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0100, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0103, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0096, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0082, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0087, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0088, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0106, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0082, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0112, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0095, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0097, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0022, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0011, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0084, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0093, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0083, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0020, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0084, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0118, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0087, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0102, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0108, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0111, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0101, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0090, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0081, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0105, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0089, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0084, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0089, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0020, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0108, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0010, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0097, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0009, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0091, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0117, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0132, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0024, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0107, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0103, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0094, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0092, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0139, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0088, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0105, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0124, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0081, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0088, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0014, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0086, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0106, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0115, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0099, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0126, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0021, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0089, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0098, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0091, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0021, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0023, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0080, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0084, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0021, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0098, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0157, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0105, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0015, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0019, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0128, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0018, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0085, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0179, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0108, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0017, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0096, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0085, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0091, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0023, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0081, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0019, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0115, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0111, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0094, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0113, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0138, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0084, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0098, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0087, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0022, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0087, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0085, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0119, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0022, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0103, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0144, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0014, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0098, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0141, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0081, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0095, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0107, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0106, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0024, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0097, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0102, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0083, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0087, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0091, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0111, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0087, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0112, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0081, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0096, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0137, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0086, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0111, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0096, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0024, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0022, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0017, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0111, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0087, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0011, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0108, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0102, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0126, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0017, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0098, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0113, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0083, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0135, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0119, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0093, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0081, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0089, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0097, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0024, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0010, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0099, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0083, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0105, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0022, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0085, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0094, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0098, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0023, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0092, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0086, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0098, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0080, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0019, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0014, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0088, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0023, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0088, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0098, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0088, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0121, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0082, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0096, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0085, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0101, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0085, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0097, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0015, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0098, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0080, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0127, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0088, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0092, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0082, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0099, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0017, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0083, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0098, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0121, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0023, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0093, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0103, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0016, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0106, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0107, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0080, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0087, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0015, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0082, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0108, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0102, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0116, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0107, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0116, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0095, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0091, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0089, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0024, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0128, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0085, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0113, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0013, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0018, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0126, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0090, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0103, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0103, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0089, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0138, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0011, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0080, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0084, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0096, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0087, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0112, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0088, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0093, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0087, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0133, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0104, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0085, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0012, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0020, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0102, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0088, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0090, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0084, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0088, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0105, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0081, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0099, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0127, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0082, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0119, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0131, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0116, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0107, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0115, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0114, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0142, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0137, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0104, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0092, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0021, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0095, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0106, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0101, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0095, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0105, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0080, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0096, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0095, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0090, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0154, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0107, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0083, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0080, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0018, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0081, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0124, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0081, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0081, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0107, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0023, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0122, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0081, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0095, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0086, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0082, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0083, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0024, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0104, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0093, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0094, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0023, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0102, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0022, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0023, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0020, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0084, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0145, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0023, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0022, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0112, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0021, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0099, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0013, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0019, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0143, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0103, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0014, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0020, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0014, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0112, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0095, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0010, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0087, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0116, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0017, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0098, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0023, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0083, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0021, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0085, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0112, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0086, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0085, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0093, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0009, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0116, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0102, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0114, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0102, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0087, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0022, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0094, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0109, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0021, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0011, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0092, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0094, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0024, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0092, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0014, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0097, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0023, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0019, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0093, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0094, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0107, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0013, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0021, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0105, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0018, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0015, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0024, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0022, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0020, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0020, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0010, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0096, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0020, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0102, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0017, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0089, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0010, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0024, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0081, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0023, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0104, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0013, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0082, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0016, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0020, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0011, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0017, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0021, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0090, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0094, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0080, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0081, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0022, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0019, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0082, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0024, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0005, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0014, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0104, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0024, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0020, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0016, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0019, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0016, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0020, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0111, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0024, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0023, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0019, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0024, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0090, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0017, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0022, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0022, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0090, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0018, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0082, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0089, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0097, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0017, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0023, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0024, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0016, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0013, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0024, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0080, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0021, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0023, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0012, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0020, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0110, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0018, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0022, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0080, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0024, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0010, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0081, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0021, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0021, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0013, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0019, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0016, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0020, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0018, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0014, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0005, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0023, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0022, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0013, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0105, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0120, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0100, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0022, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0022, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0016, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0022, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0016, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0016, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0085, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0010, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0089, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0020, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0015, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0023, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0009, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0020, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0022, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0016, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0016, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0099, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0023, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0023, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0092, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0017, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0021, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0123, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0016, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0102, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0018, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0020, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0019, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0021, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0097, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0083, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0103, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0080, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0023, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0016, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0095, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0115, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0014, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0091, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0021, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0097, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0019, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0084, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0014, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0121, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0098, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0094, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0020, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0023, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0012, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0016, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0018, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0019, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0013, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0023, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0020, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0017, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0019, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0016, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0008, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0024, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0120, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0019, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0009, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0015, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0099, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0024, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0019, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0094, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0123, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0092, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0018, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0020, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0018, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0016, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0018, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0090, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0006, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0089, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0019, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0089, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0099, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0086, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0022, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0015, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0023, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0121, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0020, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0089, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0019, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0015, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0083, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0024, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0024, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0018, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0085, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0013, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0106, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0083, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0088, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0024, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0017, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0115, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0012, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0113, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0018, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0080, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0096, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0081, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0085, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0090, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0015, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0023, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0101, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0104, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0020, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0101, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0080, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0080, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0083, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0012, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0020, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0021, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0024, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0123, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0011, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0023, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0094, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0123, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0089, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0013, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0020, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0014, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0017, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0106, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0085, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0024, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0096, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0113, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0017, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0019, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0107, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0088, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0105, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0019, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0008, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0095, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0021, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0092, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0022, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0020, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0021, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0008, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0083, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0014, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0006, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0015, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0023, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0139, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0019, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0089, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0023, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0018, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0021, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0113, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0117, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0014, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0020, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0095, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0016, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0020, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0018, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0022, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0087, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0086, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0097, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0094, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0095, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0015, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0021, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0104, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0017, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0008, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0088, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0097, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0019, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0081, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0024, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0088, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0016, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0119, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0018, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0021, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0022, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0085, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0090, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0018, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0100, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0024, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0021, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0013, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0088, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0089, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0100, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0123, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0022, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0086, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0085, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0082, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0014, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0085, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0123, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0022, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0087, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0096, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0115, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0082, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0020, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0112, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0022, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0017, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0104, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0096, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0086, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0094, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0088, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0021, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0081, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0019, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0136, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0104, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0024, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0089, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0024, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0019, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0086, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0083, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0020, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0021, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0094, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0017, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0019, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0020, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0093, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0021, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0090, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0081, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0021, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0088, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0018, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0023, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0021, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0083, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0096, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0082, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0020, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0022, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0094, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0021, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0021, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
--------------Restarting: Beginning EPOCH 8-------------
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Test Accuracy: 0.9598754341837346
Loss: tensor(0.0112, device='cuda:0', grad_fn=<NllLossBackward>)
Saving model at iteration: 0
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0115, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0097, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0166, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0128, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0190, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0093, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0120, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0015, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0094, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0085, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0129, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0095, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0108, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0097, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0160, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0142, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0089, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0170, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0128, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0101, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0097, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0111, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0128, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0139, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0103, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0208, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0091, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0149, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0129, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0165, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0103, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0150, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0216, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0103, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0171, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0163, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0014, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0009, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0016, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0107, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0006, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0186, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0095, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0081, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0103, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0010, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0112, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0022, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0115, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0150, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0088, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0002, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0016, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0018, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0101, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0002, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0002, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0098, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0209, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0110, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0092, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0016, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0152, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0114, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0002, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0128, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0092, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0115, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0002, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0016, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0121, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0005, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0140, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0103, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0098, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0013, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0002, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0002, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0005, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0101, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0094, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0093, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0005, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0005, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0088, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0103, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0002, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0005, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0002, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0005, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0018, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0002, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0117, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0137, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0099, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0127, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0004, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0090, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0012, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0002, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0002, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0129, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0156, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0012, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0004, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0002, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0002, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0002, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0002, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0133, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0119, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0107, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0082, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0015, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0021, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0002, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0107, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0004, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0082, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0002, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0165, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0002, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0002, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0004, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0083, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0004, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0112, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0157, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0152, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0091, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0016, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0096, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0189, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0186, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0155, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0086, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0116, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0083, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0018, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0007, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0002, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0227, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0086, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0101, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0080, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0002, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0019, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0145, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0190, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0106, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0013, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0083, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0085, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0080, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0131, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0144, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0016, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0123, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0132, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0085, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0095, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0084, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0098, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0122, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0090, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0114, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0108, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0113, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0138, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0093, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0128, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0197, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0165, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0085, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0095, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0176, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0009, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0138, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0111, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0112, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0138, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0092, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0125, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0014, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0195, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0083, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0008, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0117, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0088, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0102, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0117, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0167, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0013, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0129, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0095, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0091, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0089, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0156, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0080, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0131, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0110, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0084, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0099, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0091, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0022, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0160, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0016, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0104, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0015, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0099, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0162, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0085, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0130, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0195, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0083, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0166, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0164, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0147, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0096, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0084, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0146, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0014, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0137, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0023, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0095, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0106, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0119, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0155, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0139, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0084, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0107, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0111, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0169, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0167, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0129, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0022, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0022, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0010, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0091, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0021, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0098, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0125, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0143, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0085, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0137, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0021, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0148, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0118, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0015, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0197, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0170, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0108, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0099, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0109, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0180, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0082, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0115, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0159, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0088, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0023, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0102, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0098, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0101, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0120, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0132, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0106, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0021, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0215, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0106, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0193, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0022, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0182, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0089, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0080, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0086, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0015, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0081, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0099, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0099, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0017, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0005, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0022, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0017, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0023, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0115, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0018, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0023, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0103, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0114, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0135, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0023, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0142, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0107, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0020, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0010, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0134, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0107, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0103, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0109, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0095, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0117, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0084, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0115, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0097, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0008, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0086, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0134, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0156, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0119, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0080, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0099, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0094, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0002, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0090, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0008, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0084, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0109, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0019, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0109, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0137, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0151, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0008, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0010, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0107, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0127, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0131, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0019, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0016, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0013, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0140, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0167, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0102, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0160, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0090, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0117, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0023, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0019, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0087, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0003, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0083, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0141, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0129, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0115, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0092, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0010, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0017, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0123, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0084, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0104, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0003, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0085, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0095, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0020, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0086, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0103, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0009, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0090, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0151, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0109, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0096, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0009, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0124, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0094, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0104, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0174, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0091, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0127, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0114, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0097, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0080, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0084, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0014, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0101, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0133, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0141, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0014, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0107, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0102, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0082, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0014, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0086, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0188, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0005, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0091, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0024, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0024, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0008, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0010, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0102, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0105, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0096, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0106, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0015, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0122, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0103, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0010, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0151, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0118, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0023, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0115, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0098, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0112, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0022, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0137, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0016, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0093, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0130, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0087, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0091, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0138, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0101, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0010, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0181, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0024, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0019, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0190, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0128, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0085, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0094, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0095, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0210, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0112, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0099, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0118, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0147, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0108, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0083, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0131, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0139, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0174, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0113, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0133, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0192, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0096, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0084, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0020, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0081, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0020, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0086, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0153, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0197, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0130, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0090, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0019, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0147, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0006, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0021, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0122, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0016, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0109, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0021, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0088, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0084, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0150, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0013, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0179, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0020, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0024, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0019, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0168, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0094, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0012, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0103, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0013, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0009, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0005, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0099, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0019, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0107, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0090, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0093, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0012, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0116, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0245, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0083, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0081, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0141, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0087, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0012, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0015, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0131, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0134, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0080, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0088, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0003, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0105, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0116, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0016, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0135, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0102, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0002, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0086, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0109, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0188, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0099, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0114, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0021, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0118, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0017, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0006, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0115, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0118, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0084, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0116, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0100, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0098, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0084, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0127, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0150, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0095, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0092, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0013, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0103, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0102, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0094, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0118, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0137, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0127, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0108, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0023, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0017, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0020, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0143, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0131, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0120, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0122, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0087, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0086, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0080, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0169, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0116, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0085, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0002, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0159, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0086, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0103, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0123, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0105, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0116, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0108, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0009, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0116, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0171, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0158, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0132, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0100, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0017, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0095, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0151, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0097, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0083, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0156, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0146, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0129, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0115, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0109, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0116, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0097, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0124, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0160, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0093, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0082, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0198, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0094, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0092, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0015, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0082, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0083, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0114, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0136, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0024, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0133, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0097, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0005, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0115, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0103, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0159, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0108, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0129, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0083, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0096, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0097, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0140, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0112, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0137, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0161, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0022, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0142, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0092, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0080, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0129, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0103, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0159, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0083, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0024, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0084, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0124, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0143, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0128, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0016, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0108, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0019, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0096, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0016, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0140, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0016, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0021, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0142, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0128, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0113, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0100, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0080, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0132, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0181, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0124, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0135, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0008, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0107, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0123, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0119, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0019, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0122, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0126, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0011, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0195, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0007, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0133, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0016, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0178, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0120, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0007, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0010, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0080, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0089, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0116, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0021, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0019, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0016, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0096, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0111, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0021, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0131, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0082, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0167, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0094, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0019, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0131, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0090, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0005, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0006, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0138, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0094, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0092, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0126, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0012, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0139, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0122, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0127, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0013, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0008, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0082, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0019, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0023, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0106, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0125, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0092, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0128, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0143, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0085, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0103, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0154, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0098, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0019, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0134, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0009, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0146, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0347, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0107, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0019, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0087, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0111, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0010, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0091, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0088, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0001, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0108, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0093, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0081, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0091, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0118, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0019, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0131, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0082, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0120, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0082, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0002, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0018, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0131, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0082, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0098, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0005, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0088, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0024, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0142, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0101, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0016, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0123, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0016, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0082, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0008, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0108, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0097, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0103, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0191, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0127, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0156, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0157, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0085, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0115, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0010, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0118, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0131, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0015, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0089, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0133, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0106, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0091, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0110, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0102, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0081, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0120, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0212, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0097, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0088, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0081, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0087, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0104, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0160, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0118, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0119, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0143, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0080, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0136, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0113, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0118, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0096, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0126, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0099, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0011, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0008, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0021, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0087, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0127, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0155, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0008, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0015, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0093, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0017, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0161, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0008, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0102, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0097, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0127, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0109, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0101, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0006, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0009, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0088, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0114, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0104, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0112, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0134, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0023, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0015, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0098, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0177, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0024, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0124, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0093, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0080, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0106, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0125, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0133, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0021, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0115, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0145, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0021, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0014, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0160, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0003, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0104, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0017, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0100, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0114, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0008, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0009, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0024, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0020, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0102, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0120, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0094, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0158, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0149, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0015, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0142, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0002, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0088, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0113, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0187, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0111, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0130, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0019, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0086, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0087, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0013, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0137, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0161, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0109, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0111, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0005, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0088, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0022, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0104, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0098, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0135, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0018, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0015, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0108, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0017, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0003, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0019, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0013, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0001, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0022, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0096, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0141, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0111, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0144, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0100, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0080, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0122, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0011, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0080, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0099, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0019, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0088, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0013, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0023, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0011, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0148, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0024, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0086, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0092, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0106, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0017, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0108, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0100, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0003, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0117, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0018, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0152, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0111, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0017, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0137, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0085, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0136, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0016, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0151, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0188, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0122, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0080, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0089, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0007, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0023, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0018, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0102, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0006, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0083, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0099, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0114, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0012, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0015, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0104, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0105, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0101, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0095, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0097, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0121, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0011, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0160, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0018, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0092, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0124, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0082, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0128, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0130, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0018, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0122, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0096, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0101, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0011, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0116, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0086, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0089, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0231, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0086, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0012, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0145, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0185, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0084, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0024, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0010, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0087, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0097, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0010, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0020, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(8.5487e-05, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0136, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0023, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0131, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0137, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0094, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0120, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0021, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0015, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0001, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0081, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(8.8300e-05, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0006, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0138, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0081, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0107, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0108, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0163, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0007, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0080, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0097, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0100, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0007, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0113, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0108, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0091, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0019, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0107, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0020, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0098, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0003, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0114, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0093, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0097, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0086, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0021, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0093, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0121, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0159, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0007, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0084, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0024, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0166, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0141, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0136, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0023, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0101, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0089, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0090, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0023, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0119, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0021, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0111, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0014, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0102, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0107, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0095, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0023, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0103, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0022, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0024, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0020, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0023, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0137, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0087, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0092, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0081, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0108, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0155, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0136, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0123, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0106, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0084, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0103, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0139, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0179, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0005, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0136, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0008, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0134, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0138, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0083, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0021, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0228, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0003, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0141, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0083, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0084, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0012, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0017, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0022, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0105, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0104, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0080, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0175, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0013, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0012, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0109, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0088, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0018, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0120, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0116, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0208, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0084, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0159, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0230, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0130, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0105, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0014, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0096, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0015, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0094, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0136, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0080, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0024, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0086, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0113, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0091, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0006, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0113, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0008, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0090, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0169, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0021, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0022, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0137, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0083, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0100, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0022, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0100, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0019, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0143, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0094, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0125, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0091, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0112, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0087, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0017, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0116, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0163, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0126, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0012, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0020, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0022, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0024, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0095, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0084, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0015, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0104, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0024, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0024, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0015, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0014, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0014, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0143, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0113, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0100, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0096, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0007, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0081, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0088, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0088, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0104, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0086, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0010, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0002, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0082, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0081, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0087, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0017, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0087, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0095, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0115, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0008, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0089, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0131, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0093, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0115, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0097, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0102, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0114, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0121, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0099, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0083, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0014, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0089, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0017, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0106, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0139, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0013, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0094, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0081, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0144, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0154, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0110, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0089, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0023, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0024, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0119, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0021, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0093, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0015, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0102, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0023, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0024, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0108, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0082, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0121, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0015, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0011, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0012, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0110, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0017, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0018, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0015, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0095, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0018, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0015, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0102, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0130, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0015, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0132, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0096, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0017, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0017, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0016, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0086, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0008, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0021, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0104, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0023, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0018, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0008, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0009, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0018, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0186, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0092, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0081, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0137, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0097, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0098, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0116, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0012, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0139, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0001, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0161, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0203, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0019, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0019, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0107, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0091, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0003, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0022, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0130, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0013, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0119, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0179, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0024, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0096, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0109, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0080, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0189, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0153, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0096, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0093, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0192, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0148, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0149, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0092, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0095, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0103, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0197, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0017, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0141, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0089, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0121, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0099, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0021, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0024, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0024, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0150, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0100, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0096, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0115, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0133, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0112, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0023, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0109, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0087, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0011, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0009, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0130, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0103, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0080, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0092, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0084, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0024, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0019, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0021, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0119, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0006, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0104, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0118, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0009, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0103, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0092, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0087, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0021, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0081, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0101, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0081, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0019, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0013, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0019, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0125, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0010, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0002, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0157, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0004, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0125, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0169, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0003, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0095, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0124, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0100, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0005, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0123, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0133, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0131, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0012, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0093, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0011, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0087, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0013, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0020, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0090, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0146, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0185, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0101, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0007, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0164, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0109, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0022, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0082, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0090, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0112, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0017, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0112, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0129, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0089, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0020, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0106, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0081, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0096, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0144, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0081, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0091, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0022, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0090, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0103, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0013, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0103, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0090, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0080, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0129, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0014, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0009, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0097, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0085, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0160, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0005, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0022, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0021, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0082, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0086, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0145, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0089, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0120, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0020, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0017, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0004, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0130, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0155, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0177, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0005, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0118, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0105, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0095, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0113, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0125, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0144, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0104, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0016, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0081, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0129, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0127, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0127, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0012, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0016, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0021, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0097, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0096, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0127, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0089, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0115, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0164, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0145, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0013, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0093, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0138, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0101, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0089, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0096, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0282, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0082, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0024, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0024, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0009, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0084, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0024, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0115, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0018, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0100, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0014, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0130, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0006, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0098, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0019, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0097, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0143, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0084, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0182, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0114, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0092, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0124, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0091, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0085, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0001, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0087, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0015, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0095, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0022, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0021, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0020, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0021, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0085, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0107, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0005, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0019, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0122, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0002, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0129, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0118, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0080, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0081, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0148, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0092, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0089, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0023, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0113, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0014, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0084, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0006, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0006, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0098, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0023, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0185, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0110, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0131, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0082, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0093, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0121, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0022, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0016, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0083, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0088, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0093, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0135, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0086, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0092, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0020, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0089, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0193, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0024, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0003, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0127, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0081, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0007, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0130, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0150, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0098, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0011, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0080, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0090, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0023, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0081, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0083, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0165, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0124, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0085, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0010, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0100, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0080, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0023, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0015, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0125, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0012, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0018, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0107, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0099, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0011, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0114, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0115, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0091, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0081, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0124, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0121, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0005, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0009, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0091, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0093, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0020, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0140, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0105, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0163, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0212, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0015, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0084, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0112, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0020, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0005, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0137, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0023, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0218, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0164, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0084, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0091, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0020, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0091, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0089, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0115, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0085, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0098, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0013, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0022, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0023, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0014, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0103, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0004, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0133, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0132, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0099, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0161, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0020, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0023, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0004, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0082, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0203, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0014, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0106, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0012, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0002, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0166, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0132, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0022, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0019, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0090, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0024, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0012, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0165, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0134, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0136, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0106, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0098, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0128, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0020, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0102, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0080, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0198, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0177, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0167, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0012, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0090, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0125, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0019, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0015, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0085, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0092, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0022, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0110, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0090, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0108, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0015, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0122, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0008, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0013, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0082, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0013, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0158, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0156, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0022, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0130, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0150, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0081, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0023, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0092, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0009, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0107, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0012, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0116, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0095, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0152, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0023, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0121, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0081, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0105, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0111, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0128, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0090, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0012, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0081, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0100, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0097, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0087, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0016, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0022, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0023, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0101, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0087, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0021, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0021, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0024, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0146, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0091, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0106, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0083, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0107, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0135, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0007, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0135, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0001, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0087, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0105, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0021, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0083, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0095, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0007, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0020, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0100, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0107, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0196, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0096, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0123, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0024, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0081, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0123, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0153, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0117, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0121, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0109, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0194, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0013, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0161, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0010, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0134, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0019, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0088, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0089, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0115, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0091, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0107, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0180, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0002, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0019, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0095, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0014, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0124, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0083, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0090, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0188, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0084, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0106, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0142, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0022, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0104, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0019, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0085, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0161, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0141, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0021, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0130, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0021, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0098, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0137, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0128, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0147, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0022, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0120, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0107, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0222, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0098, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0020, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0146, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0018, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0116, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0107, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0019, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0084, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0093, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0015, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0247, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0100, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0129, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0084, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0095, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0114, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0023, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0111, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0125, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0123, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0161, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0013, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0001, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0133, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0090, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0011, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0138, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0169, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0106, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0085, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0088, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0080, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0133, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0103, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0125, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0111, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0130, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0014, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0104, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0157, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0094, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0090, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0151, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0021, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0186, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0117, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0152, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0021, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0140, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0020, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0100, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0141, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0084, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0019, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0080, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0112, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0091, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0094, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0098, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0122, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0106, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0085, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0005, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0024, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0012, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0081, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0094, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0011, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0012, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0105, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0159, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0099, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0092, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0081, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0105, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0084, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0098, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0022, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0086, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0115, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0122, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0123, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0101, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0024, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0094, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0100, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0082, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0118, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0084, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0010, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0087, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0022, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0011, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0112, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0100, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0111, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0088, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0012, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0094, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0008, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0008, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0004, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0017, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0011, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0154, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0010, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0018, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0153, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0085, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0140, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0007, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0007, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0085, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0024, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0016, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0113, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0158, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0018, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0015, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0124, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0136, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0022, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0011, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0127, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0009, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0122, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0119, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0134, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0085, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0157, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0003, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0087, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0118, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0098, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0081, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0132, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0018, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0105, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0010, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0132, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0096, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0023, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0104, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0090, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0089, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0116, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0086, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0022, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0153, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0084, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0148, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0023, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0024, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0011, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0089, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0091, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0012, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0021, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0016, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0093, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0013, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0095, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0018, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0018, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0017, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0117, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0161, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0084, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0103, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0006, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0086, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0094, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0022, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0006, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0088, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0021, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0108, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0081, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0107, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0092, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0098, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0006, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0123, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0129, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0106, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0094, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0086, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0105, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0093, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0083, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0002, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0006, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0146, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0123, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0233, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0147, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0083, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0003, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0014, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0191, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0205, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0101, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0113, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0022, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0096, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0121, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0194, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0023, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0087, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0100, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0089, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0087, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0013, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0084, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0103, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0083, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0113, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0011, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0127, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0022, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0086, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0128, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0094, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0012, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0005, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0094, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0099, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0086, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0014, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0016, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0019, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0024, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0131, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0010, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0023, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0016, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0080, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0096, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0109, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0080, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(9.6844e-05, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0014, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0086, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0091, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0084, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0132, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0089, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0020, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0132, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0023, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0134, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0091, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0020, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0087, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0010, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0159, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0124, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0119, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0128, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0024, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0093, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0117, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0091, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0093, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0017, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0102, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0023, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0090, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0019, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0081, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0100, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0021, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0015, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0148, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0018, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0006, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0190, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0122, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0088, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0101, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0022, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0014, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0007, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0023, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0022, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0084, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0104, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0174, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0023, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0089, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0139, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0099, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0017, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0009, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0084, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0021, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0122, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0105, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0116, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0089, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0114, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0112, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0092, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0094, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0123, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0103, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0093, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0094, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0115, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0103, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0022, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0023, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0021, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0114, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0100, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0024, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0022, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0203, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0175, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0172, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0001, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0107, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0102, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0003, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0098, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0124, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0014, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0020, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0098, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0136, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0135, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0097, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0021, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0120, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0123, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0005, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0120, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0021, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0099, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0148, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0107, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0100, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0134, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0020, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0009, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0155, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0080, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0105, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0095, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0018, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0110, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0014, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0150, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0023, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0021, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0015, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0087, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0001, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0083, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0016, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0003, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0123, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0108, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0014, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0011, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0131, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0118, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0016, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0121, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0013, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0153, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0020, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0082, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0018, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0129, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0087, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0122, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0005, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0109, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0084, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0139, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0149, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0023, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0135, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0080, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0107, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0009, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0100, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0019, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0109, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0115, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0006, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0081, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0087, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0171, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0082, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0100, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0014, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0084, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0016, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0088, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0080, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0148, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0088, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0095, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0094, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0082, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0127, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0007, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0008, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0024, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0126, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0024, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0083, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0119, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0115, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0087, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0014, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0172, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0100, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0017, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0021, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0080, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0204, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0114, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0133, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0175, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0087, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0089, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0008, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0007, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0097, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0002, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0187, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0003, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0013, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0135, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0095, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0011, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0022, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0102, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0020, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0081, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0154, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0093, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0003, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0093, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0146, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0090, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0127, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0124, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0120, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0148, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0022, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0111, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0019, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0187, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0012, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0023, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0017, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0002, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0156, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0121, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0097, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0102, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0023, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0096, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0127, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0121, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0020, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0087, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0100, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0106, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0089, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0018, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0112, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0091, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0115, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0092, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0094, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0011, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0137, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0103, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0020, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0094, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0017, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0113, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0021, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0084, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0088, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0013, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0005, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0018, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0023, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0001, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0090, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0018, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0116, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0104, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0185, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0023, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0018, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0093, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0128, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0004, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0167, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0009, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0019, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0097, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0002, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0115, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0021, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0083, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0089, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0117, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0086, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0020, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0018, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0101, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0189, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0094, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0012, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0152, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0014, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0091, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0098, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0016, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0199, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0127, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0023, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0102, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0094, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0023, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0099, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0168, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0091, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0114, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0103, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0014, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0015, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0114, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0092, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0088, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0100, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0097, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0176, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0110, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0086, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0087, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0086, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0016, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0018, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0137, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0117, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0023, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0123, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0126, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0087, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0118, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0022, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0020, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0001, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0123, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0083, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(7.1924e-05, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0133, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0088, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0152, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0019, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0097, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0014, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0085, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0019, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0024, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0190, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0021, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0102, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0084, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0012, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(6.8245e-05, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0003, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0080, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0148, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0121, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0150, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0151, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0090, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0022, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0099, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0013, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0005, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0124, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0117, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(6.9241e-05, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0019, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0186, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0220, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0015, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0005, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0088, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0005, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0119, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0107, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0007, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0082, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(8.1228e-05, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0082, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0094, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0010, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(7.9955e-05, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0096, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0095, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(7.4581e-05, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0081, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0099, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0107, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0021, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0005, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0082, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0155, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0019, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0112, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0018, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0093, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0148, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0013, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0080, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0088, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0022, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0154, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0009, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0085, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0022, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0101, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0090, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0004, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0143, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0017, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0023, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0084, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0082, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0107, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0090, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0092, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0022, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0115, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0127, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0139, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0082, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0022, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0118, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0085, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0015, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0087, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0088, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0021, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0099, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0155, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0022, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0082, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0096, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0089, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0084, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0003, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0134, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0080, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0132, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0102, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0003, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0011, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0089, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0155, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0085, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0097, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0110, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0015, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0108, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0097, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0145, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0022, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0084, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0022, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0135, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0086, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0020, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0081, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0023, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0114, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0023, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0016, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0081, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0080, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0016, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0112, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0089, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0093, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0004, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0103, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0095, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0090, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0105, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0119, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0135, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0087, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0011, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0021, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0093, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0024, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0022, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0153, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0089, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0154, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0008, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0020, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0146, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0083, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0013, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0099, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0107, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0102, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0091, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0092, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0081, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0015, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0014, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0150, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0087, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0001, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0091, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0105, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0081, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0097, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0022, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0092, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0129, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0105, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0013, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0136, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0089, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0108, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0021, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0016, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0103, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0012, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0098, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0091, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0082, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0113, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0105, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0154, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0119, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0100, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0018, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0083, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0090, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0082, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0102, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0126, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0016, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0014, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0087, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0015, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0109, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0080, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0105, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0087, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0117, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0007, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0156, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0086, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0110, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0122, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0126, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0084, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0091, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0093, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0109, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0158, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0023, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0186, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0095, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0115, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0087, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0137, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0124, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0132, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0024, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0091, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0004, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0023, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0111, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0085, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0083, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0022, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0092, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0084, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0088, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0129, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0120, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0090, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0016, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0014, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0081, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0005, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0097, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0103, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0184, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0102, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0122, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0022, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0104, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0111, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0099, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0117, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0176, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0091, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0092, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0129, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0117, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0020, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0112, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0101, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0159, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0128, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0011, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0080, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0017, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0023, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0122, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0123, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0008, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0086, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0086, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0113, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0093, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0131, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0137, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0146, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0011, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0106, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0092, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0098, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0097, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0115, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0022, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0090, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0082, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0115, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0159, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0022, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0084, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0102, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0082, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0084, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0108, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0097, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0014, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0088, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0022, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0022, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0083, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0095, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0162, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0097, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0023, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0018, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0097, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0011, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0098, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0124, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0014, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0016, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0102, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0091, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0103, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0015, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0098, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0023, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0104, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0023, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0018, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0123, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0087, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0100, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0113, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0097, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0080, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0116, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0103, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0110, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0130, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0097, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0114, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0018, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0109, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0100, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0109, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0097, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0008, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0024, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0148, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0091, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0120, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0018, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0093, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0164, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0095, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0099, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0112, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0086, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0088, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0080, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0015, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0089, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0087, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0085, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0123, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0094, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0120, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0093, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0019, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0112, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0091, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0091, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0087, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0124, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0019, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0024, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0094, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0141, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0104, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0201, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0100, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0085, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0090, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0019, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0095, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0100, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0009, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0154, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0133, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0090, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0080, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0004, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0015, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0090, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0123, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0101, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0112, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0080, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0119, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0107, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0024, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0119, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0014, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0015, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0012, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0092, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0084, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0022, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0083, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0098, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0098, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0095, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0101, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0104, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0020, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0142, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0144, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0108, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0139, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0112, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0094, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0012, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0083, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0205, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0081, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0100, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0134, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0160, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0102, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0023, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0141, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0109, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0089, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0096, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0083, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0019, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0019, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0093, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0100, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0105, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0132, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0122, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0120, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0094, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0088, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0098, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0109, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0090, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0099, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0022, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0089, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0020, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0237, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0141, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0108, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0096, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0133, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0095, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0111, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0021, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0099, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0082, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0013, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0152, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0004, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0112, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0117, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0136, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0139, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0082, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0011, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0137, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0091, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0108, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0024, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0085, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0138, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0132, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0015, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0023, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0022, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0092, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0080, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0019, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0081, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0093, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0125, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0155, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0087, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0081, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0020, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0094, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0128, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0095, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0112, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0093, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0118, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0016, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0107, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0120, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0093, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0091, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0083, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0010, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0105, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0022, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0154, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0095, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0082, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0083, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0125, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0024, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0015, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0021, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0096, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0106, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0093, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0093, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0087, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0153, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0082, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0081, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0091, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0092, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0016, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0103, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0113, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0099, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0080, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0123, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0093, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0115, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0132, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0088, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0120, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0017, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0092, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0021, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0093, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0017, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0134, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0118, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0126, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0094, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0018, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0150, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0083, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0097, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0011, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0095, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0097, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0019, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0181, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0080, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0018, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0129, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0082, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0017, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0014, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0112, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0107, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0097, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0111, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0009, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0140, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0012, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0089, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0100, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0095, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0011, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0093, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0082, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0085, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0116, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0098, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0023, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0024, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0109, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0107, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0111, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0131, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0024, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0103, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0112, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0021, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0092, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0013, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0019, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0101, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0104, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0023, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0175, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0102, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0081, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0007, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0142, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0021, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0100, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0153, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0151, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0083, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0014, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0015, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0087, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0160, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0005, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0124, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0133, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0017, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0158, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0107, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0020, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0014, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0149, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0092, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0023, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0103, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0124, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0020, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0085, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0024, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0116, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0005, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0113, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0087, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0081, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0011, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0093, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0102, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0140, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0016, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0099, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0081, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0110, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0109, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0116, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0093, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0023, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0103, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0093, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0084, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0091, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0095, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0088, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0088, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0111, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0151, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0088, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0020, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0101, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0006, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0023, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0084, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0120, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0144, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0159, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0099, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0109, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0101, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0123, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0092, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0144, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0127, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0109, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0089, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0022, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0136, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0101, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0110, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0152, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0088, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0097, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0085, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0094, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0005, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0090, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0010, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0084, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0091, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0024, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0019, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0092, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0024, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0108, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0107, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0085, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0086, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0081, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0166, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0024, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0011, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0107, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0081, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0080, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0018, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0130, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0106, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0086, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0098, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0090, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0084, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0088, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0107, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0012, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0092, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0147, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0084, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0147, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0011, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0152, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0086, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0174, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0019, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0008, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0136, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0130, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0009, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0022, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0083, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0080, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0089, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0103, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0090, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0082, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0107, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0104, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0118, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0114, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0003, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0082, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0162, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0080, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0024, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0018, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0104, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0119, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0087, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0083, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0023, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0016, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0142, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0007, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0020, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0146, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0086, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0081, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0276, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0082, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0085, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0123, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0082, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0018, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0086, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0114, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0021, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0085, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0121, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0095, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0018, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0117, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0109, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0123, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0138, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0103, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0088, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0110, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0095, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0090, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0080, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0124, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0015, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0080, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0093, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0021, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0018, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0010, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0020, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0020, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0022, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0109, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0017, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0024, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0022, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0115, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0015, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0010, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0017, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0010, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0101, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0022, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0130, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0105, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0094, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0089, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0020, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0024, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0017, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0024, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0080, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0168, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0022, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0086, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0111, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0019, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0015, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0018, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0094, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0111, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0019, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0122, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0142, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0099, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0082, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0022, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0091, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0020, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0092, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0016, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0093, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0080, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0011, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0147, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0022, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0024, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0006, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0009, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0094, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0108, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0011, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0114, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0105, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0135, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0022, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0132, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0092, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0096, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0095, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0088, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0156, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0117, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0023, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0093, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0098, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0081, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0088, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0087, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0089, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0135, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0080, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0094, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0023, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0018, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0100, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0106, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0095, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0084, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0100, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0022, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0092, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0096, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0008, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0134, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0023, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0019, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0089, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0024, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0016, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0165, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0120, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0024, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0107, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0092, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0087, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0099, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0082, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0083, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0023, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0084, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0085, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0086, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0080, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0095, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0095, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0164, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0093, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0023, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0112, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0114, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0101, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0021, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0014, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0138, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0096, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0129, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0082, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0007, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0019, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0125, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0092, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0096, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0084, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0081, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0113, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0021, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0086, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0091, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0107, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0149, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0083, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0088, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0023, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0120, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0100, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0121, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0011, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0109, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0108, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0023, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0020, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0016, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0024, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0093, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0093, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0022, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0134, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0127, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0096, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0021, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0087, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0089, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0011, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0008, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0102, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0107, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0010, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0129, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0082, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0020, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0130, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0122, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0009, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0089, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0086, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0024, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0084, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0103, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0014, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0164, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0024, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0018, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0084, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0130, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0127, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0098, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0080, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0111, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0090, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0086, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0017, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0108, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0080, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0109, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0089, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0107, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0093, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0095, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0021, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0080, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0103, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0017, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0022, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0080, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0023, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0013, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0023, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0082, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0023, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0112, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0119, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0012, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0140, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0084, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0085, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0007, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0123, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0125, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0087, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0186, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0093, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0024, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0020, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0004, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0086, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0020, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0084, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0021, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0113, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0097, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0133, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0104, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0091, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0089, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0100, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0090, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0096, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0091, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0091, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0005, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0021, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0128, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0087, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0111, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0090, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0105, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0169, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0088, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0022, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0089, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0118, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0098, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0022, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0010, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0022, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0020, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0107, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0097, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0097, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0083, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0017, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0016, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0095, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0098, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0085, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0014, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0114, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0121, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0105, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0020, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0024, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0015, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0105, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0103, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0014, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0083, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0105, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0096, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0023, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0127, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0080, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0099, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0021, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0009, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0093, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0088, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0110, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0013, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0093, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0095, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0098, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0096, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0145, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0102, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0021, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0087, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0124, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0020, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0024, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0114, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0141, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0023, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0093, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0085, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0080, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0085, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0012, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0095, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0112, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0006, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0021, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0099, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0109, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0090, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0095, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0007, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0085, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0019, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0115, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0114, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0023, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0104, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0080, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0116, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0090, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0089, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0116, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0087, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0024, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0016, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0092, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0087, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0107, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0098, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0148, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0162, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0088, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0146, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0080, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0107, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0087, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0128, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0092, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0129, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0112, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0114, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0083, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0086, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0114, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0133, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0121, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0113, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0143, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0095, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0093, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0015, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0128, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0094, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0089, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0101, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0023, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0094, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0117, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0127, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0100, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0100, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0100, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0020, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0107, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0099, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0107, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0099, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0086, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0103, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0090, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0115, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0098, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0014, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0022, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0139, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0018, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0091, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0022, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0013, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0118, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0092, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0015, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0024, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0106, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0097, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0117, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0082, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0081, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0098, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0091, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0018, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0145, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0157, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0083, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0019, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0119, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0092, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0095, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0103, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0083, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0021, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0023, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0094, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0019, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0082, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0091, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0018, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0125, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0090, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0188, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0098, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0020, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0085, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0081, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0086, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0086, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0013, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0112, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0100, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0089, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0013, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0092, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0112, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0137, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0089, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0020, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0092, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0095, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0106, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0117, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0111, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0111, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0082, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0094, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0017, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0091, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0109, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0080, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0095, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0121, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0084, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0114, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0099, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0009, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0099, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0109, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0109, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0122, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0083, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0086, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0097, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0222, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0024, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0123, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0019, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0013, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0021, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0104, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0020, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0089, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0097, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0094, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0125, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0083, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0082, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0015, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0158, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0015, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0153, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0112, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0203, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0092, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0109, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0095, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0088, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0016, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0015, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0106, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0087, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0084, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0091, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0101, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0099, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0116, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0100, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0084, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0094, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0095, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0094, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0022, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0126, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0007, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0104, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0082, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0015, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0024, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0112, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0018, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0008, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0116, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0095, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0090, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0108, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0142, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0103, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0015, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0113, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0018, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0122, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0023, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0021, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0014, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0018, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0094, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0017, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0128, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0091, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0093, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0089, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0018, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0099, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0120, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0090, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0019, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0108, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0086, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0020, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0112, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0014, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0024, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0095, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0096, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0015, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0019, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0082, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0118, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0023, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0024, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0015, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0019, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0019, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0081, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0009, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0017, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0019, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0021, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0126, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0132, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0022, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0156, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0098, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0089, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0022, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0019, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0108, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0105, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0086, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0159, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0109, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0100, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0019, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0089, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0091, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0103, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0150, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0090, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0107, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0086, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0090, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0019, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0102, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0109, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0089, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0024, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0102, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0085, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0190, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0093, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0117, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0113, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0082, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0007, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0081, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0023, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0124, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0081, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0087, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0022, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0096, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0019, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0023, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0017, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0022, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0085, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0106, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0083, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0019, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0020, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0138, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0083, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0081, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0023, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0094, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0018, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0104, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0113, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0102, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0080, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0019, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0097, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0094, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0008, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0021, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0080, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0088, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0159, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0011, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0115, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0113, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0082, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0101, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0024, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0013, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0165, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0081, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0094, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0111, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0088, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0119, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0013, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0090, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0115, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0171, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0024, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0020, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0141, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0102, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0017, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0132, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0119, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0018, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0017, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0014, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0008, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0121, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0096, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0086, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0019, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0087, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0102, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0134, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0116, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0024, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0016, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0013, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0089, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0098, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0084, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0097, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0022, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0022, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0093, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0092, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0022, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0087, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0092, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0021, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0022, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0023, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0088, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0097, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0017, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0023, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0119, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0135, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0119, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0117, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0084, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0098, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0098, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0090, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0102, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0085, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0015, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0013, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0088, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0082, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0013, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0112, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0008, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0090, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0088, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0094, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0018, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0024, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0018, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0104, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0109, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0021, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0104, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0117, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0148, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0149, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0019, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0112, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0099, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0085, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0099, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0081, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0084, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0092, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0086, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0020, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0014, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0080, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0024, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0014, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0103, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0021, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0085, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0115, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0081, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0092, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0134, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0107, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0154, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0087, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0013, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0106, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0117, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0090, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0111, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0084, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0021, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0080, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0017, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0098, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0108, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0132, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0024, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0171, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0010, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0113, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0022, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0024, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0130, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0011, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0104, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0113, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0115, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0092, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0085, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0085, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0106, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0082, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0086, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0089, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0163, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0150, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0090, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0117, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0020, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0105, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0082, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0021, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0098, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0098, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0021, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0020, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0163, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0083, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0133, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0113, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0094, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0125, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0166, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0090, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0087, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0089, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0115, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0024, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0135, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0103, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0091, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0018, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0100, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0087, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0019, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0016, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0114, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0016, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0024, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0090, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0091, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0133, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0024, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0114, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0024, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0083, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0091, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0087, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0080, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0018, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0020, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0005, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0020, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0012, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0098, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0101, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0024, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0090, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0115, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0100, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0013, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0084, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0088, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0021, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0116, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0121, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0092, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0024, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0110, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0082, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0101, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0080, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0010, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0084, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0085, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0020, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0141, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0102, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0098, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0112, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0019, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0020, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0020, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0099, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0021, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0123, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0018, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0020, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0086, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0022, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0013, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0081, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0127, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0011, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0084, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0107, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0023, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0020, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0017, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0097, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0020, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0125, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0016, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0091, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0095, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0008, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0145, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0010, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0102, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0094, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0017, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0082, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0146, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0013, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0017, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0138, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0097, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0015, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0019, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0012, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0112, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0085, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0116, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0127, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0088, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0100, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0086, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0013, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0018, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0014, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0102, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0087, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0089, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0011, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0014, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0021, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0082, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0099, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0088, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0010, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0090, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0024, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0083, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0085, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0082, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0024, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0092, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0101, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0107, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0090, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0015, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0022, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0096, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0023, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0021, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0018, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0011, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0122, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0099, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0191, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0006, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0021, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0020, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0024, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0013, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0084, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0021, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0023, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0086, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0104, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0094, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0014, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0120, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0017, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0169, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0138, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0006, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0111, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0117, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0084, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0083, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0082, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0018, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0145, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0089, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0015, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0019, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0090, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0114, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0091, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0103, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0017, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0016, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0013, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0016, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0107, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0083, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0018, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0095, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0097, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0094, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0099, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0157, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0020, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0017, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0017, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0088, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0018, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0016, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0081, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0007, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0024, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0020, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0011, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0080, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0106, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0024, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0081, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0090, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0006, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0017, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0087, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0081, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0009, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0011, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0110, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0109, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0123, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0087, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0080, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0081, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0088, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0119, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0117, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0005, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0109, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0011, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0012, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0020, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0109, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0087, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0020, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0112, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0016, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0023, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0133, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0098, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0022, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0015, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0015, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0132, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0146, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0103, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0014, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0082, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0015, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0088, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0132, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0004, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0113, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0014, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0014, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0022, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0018, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0014, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0113, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0132, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0097, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0107, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0011, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0148, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0013, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0082, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0203, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0086, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0090, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0080, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0023, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0091, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0087, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0103, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0011, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0015, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0084, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0081, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0090, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0084, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0153, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0012, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0020, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0013, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0086, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0016, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0117, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0024, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0080, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0023, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0121, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0018, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0121, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0109, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0022, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0013, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0125, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0020, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0016, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0094, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0024, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0227, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0111, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0020, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0108, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0218, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0099, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0107, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0024, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0022, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0123, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0106, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0082, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0114, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0016, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0022, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0112, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0015, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0011, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0022, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0010, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0080, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0021, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0091, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0097, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0081, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0009, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0016, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0092, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0024, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0081, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0092, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0014, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0016, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0113, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0016, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0103, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0022, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0082, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0080, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0154, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0024, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0099, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0014, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0082, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0102, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0023, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0126, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0020, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0140, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0085, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0024, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0021, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0098, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0094, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0107, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0148, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0013, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0111, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0006, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0120, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0105, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0010, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0090, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0120, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0086, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0109, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0024, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0024, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0103, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0103, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0090, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0019, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0092, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0016, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0133, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0112, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0105, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0018, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0015, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0171, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0086, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0086, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0083, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0093, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0019, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0093, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0122, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0102, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0136, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0082, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0102, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0010, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0019, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0014, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0022, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0108, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0014, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0158, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0108, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0094, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0144, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0021, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0170, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0024, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0023, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0010, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0011, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0171, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0090, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0002, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0024, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0017, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0015, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0005, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0101, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0012, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0020, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0100, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0108, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0125, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0016, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0103, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0105, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0022, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0005, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0014, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0110, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0013, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0024, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0081, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0017, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0006, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0104, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0082, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0021, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0019, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0019, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0116, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0024, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0097, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0080, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0013, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0019, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0085, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0108, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0093, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0024, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0017, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0006, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0023, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0092, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0024, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0020, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0085, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0015, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0113, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0012, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0021, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0008, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0104, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0011, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0082, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0114, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0014, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0084, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0098, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0019, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0092, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0015, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0081, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0019, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0104, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0098, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0012, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0085, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0193, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0018, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0103, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0018, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0100, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0084, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0024, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0085, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0129, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0021, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0013, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0086, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0017, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0016, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0013, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0021, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0009, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0013, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0010, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0151, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0022, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0153, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0121, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0013, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0102, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0016, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0022, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0021, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0131, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0080, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0124, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0141, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0113, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0082, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0149, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0023, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0019, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0088, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0118, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0089, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0176, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0006, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0096, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0110, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0094, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0115, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0011, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0021, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0146, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0182, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0024, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0013, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0017, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0018, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0083, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0018, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0112, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0099, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0099, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0086, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0080, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0106, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0112, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0007, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0019, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0016, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0109, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0089, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0099, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0091, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0105, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0022, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0015, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0015, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0013, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0095, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0103, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0101, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0106, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0108, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0020, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0115, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0080, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0016, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0124, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0017, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0020, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0018, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0105, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0097, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0018, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0017, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0014, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0084, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0102, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0017, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0083, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0023, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0022, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0012, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0083, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0082, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0088, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0009, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0018, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0105, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0009, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0010, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0080, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0085, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0015, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0012, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0105, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0103, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0013, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0023, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0086, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0024, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0165, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0088, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0145, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0024, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0012, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0082, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0015, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0023, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0130, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0019, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0007, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0086, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0127, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0012, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0022, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0010, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0095, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0020, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0080, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0127, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0086, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0103, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0014, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0089, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0094, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0172, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0082, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0092, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0097, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0022, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0020, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0016, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0103, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0018, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0092, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0014, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0022, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0082, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0123, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0010, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0021, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0106, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0016, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0022, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0098, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0011, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0015, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0019, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0018, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0005, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0102, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0013, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0096, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0021, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0018, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0082, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0090, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0092, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0123, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0098, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0089, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0020, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0020, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0024, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0021, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0105, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0097, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0108, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0019, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0020, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0110, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0085, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0017, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0018, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0096, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0016, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0084, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0014, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0016, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0022, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0100, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0023, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0005, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0099, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0006, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0023, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0014, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0015, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0019, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0016, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0020, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0158, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0081, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0017, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0024, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0098, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0129, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0017, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0020, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0018, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0020, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0021, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0012, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0116, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0103, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0012, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0150, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0010, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0015, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0106, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0023, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0011, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0084, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0016, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0020, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0097, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0094, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0093, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0093, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0080, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0134, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0080, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0117, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0101, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0107, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0099, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0111, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0022, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0014, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0099, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0111, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0016, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0009, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0085, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0109, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0088, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0018, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0108, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0116, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0019, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0015, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0017, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0087, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0093, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0022, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0109, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0091, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0012, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0114, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0138, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0094, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0016, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0100, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0022, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0011, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0006, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0095, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0166, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0089, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0107, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0094, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0081, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0013, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0017, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0096, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0019, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0006, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0088, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0018, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0011, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0018, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0090, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0012, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0081, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0105, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0093, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0024, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0088, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0091, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0010, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0116, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0121, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0087, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0145, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0017, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0083, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0099, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0151, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0018, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0016, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0116, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0023, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0096, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0094, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0083, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0082, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0158, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0020, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0144, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0085, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0024, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0095, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0017, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0019, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0120, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0021, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0096, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0102, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0097, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0098, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0018, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0013, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0112, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0020, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0017, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0096, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0171, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0021, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0119, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0080, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0102, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0092, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0097, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0024, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0115, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0090, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0091, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0024, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0111, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0191, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0106, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0023, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0020, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0021, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0106, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0021, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0102, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0023, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0085, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0131, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0101, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0083, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0112, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0083, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0121, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0083, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0112, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0022, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0084, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0106, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0101, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0087, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0012, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0096, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0017, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0092, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0105, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0111, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0105, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0109, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0094, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0094, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0083, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0015, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0090, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0008, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0082, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0007, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0082, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0019, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0091, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0016, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0021, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0089, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0115, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0080, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0134, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0013, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0113, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0009, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0014, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0020, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0121, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0087, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0086, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0092, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0084, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0086, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0087, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0023, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0091, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0016, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0119, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0102, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0094, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0086, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0115, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0091, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0085, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0083, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0114, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0023, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0015, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0020, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0093, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0090, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0088, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0093, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0021, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0097, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0090, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0023, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0016, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0022, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0023, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0016, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0014, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0116, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0090, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0090, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0017, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0014, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0023, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0084, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0105, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0082, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0099, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0098, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0128, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0102, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0023, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0091, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0023, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0023, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0019, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0019, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0018, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0133, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0015, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0088, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0084, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0012, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0112, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0011, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0107, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0113, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0088, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0013, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0120, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0103, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0132, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0124, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0083, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0023, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0023, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0014, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0019, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0107, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0085, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0141, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0013, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0084, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0114, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0092, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0024, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0125, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0017, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0090, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0023, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0088, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0104, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0099, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0100, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0021, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0085, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0088, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0022, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0113, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0023, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0114, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0086, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0106, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0013, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0090, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0112, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0114, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0096, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0024, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0089, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0022, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0125, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0024, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0017, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0133, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0118, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0021, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0083, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0099, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0111, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0024, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0015, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0084, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0024, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0099, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0119, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0018, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0090, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0107, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0145, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0098, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0100, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0083, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0141, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0105, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0021, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0017, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0080, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0086, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0021, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0083, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0013, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0008, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0024, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0016, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0015, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0023, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0113, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0082, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0095, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0015, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0098, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0101, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0011, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0104, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0093, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0112, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0146, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0082, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0085, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0094, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0101, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0022, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0019, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0022, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0084, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0019, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0111, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0085, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0092, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0016, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0107, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0085, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0086, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0024, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0164, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0021, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0006, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0014, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0024, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0018, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0008, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0112, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0094, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0021, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0093, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0099, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0096, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0023, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0141, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0152, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0108, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0013, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0083, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0099, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0180, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0024, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0083, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0082, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0104, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0126, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0131, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0082, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0100, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0144, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0080, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0011, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0121, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0126, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0089, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0095, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0083, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0120, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0095, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0097, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0014, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0019, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0125, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0023, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0148, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0105, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0126, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0016, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0084, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0012, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0019, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0106, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0089, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0011, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0090, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0096, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0115, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0124, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0098, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0090, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0011, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0014, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0081, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0009, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0022, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0015, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0014, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0087, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0080, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0094, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0131, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0090, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0013, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0106, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0092, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0087, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0115, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0092, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0149, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0118, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0099, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0086, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0018, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0024, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0021, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0012, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0021, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0081, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0022, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0087, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0016, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0099, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0144, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0022, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0080, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0099, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0090, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0094, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0021, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0010, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0095, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0024, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0018, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0021, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0102, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0023, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0131, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0088, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0088, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0080, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0081, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0122, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0163, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0022, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0021, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0020, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0021, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0024, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0010, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0156, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0097, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0137, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0086, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0119, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0024, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0101, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0020, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0084, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0093, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0021, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0085, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0104, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0131, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0103, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0098, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0016, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0116, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0096, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0016, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0020, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0090, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0096, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0017, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0011, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0007, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0088, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0081, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0121, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0102, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0018, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0086, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0018, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0100, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0110, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0015, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0015, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0099, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0081, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0089, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0100, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0009, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0013, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0012, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0142, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0020, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0106, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0021, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0103, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0017, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0175, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0090, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0080, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0081, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0122, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0139, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0084, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0114, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0021, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0110, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0015, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0021, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0097, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0085, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0108, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0093, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0087, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0020, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0082, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0098, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0092, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0092, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0142, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0101, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0095, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0017, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0008, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0085, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0131, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0100, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0140, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0089, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0020, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0022, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0018, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0114, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0088, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0082, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0125, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0106, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0018, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0080, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0022, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0098, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0081, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0024, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0087, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0016, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0018, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0093, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0120, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0094, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0009, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0024, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0021, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0010, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0023, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0087, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0007, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0097, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0107, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0091, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0019, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0019, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0022, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0023, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0098, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0021, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0020, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0080, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0086, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0014, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0010, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0016, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0022, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0109, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0082, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0019, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0145, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0083, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0024, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0024, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0012, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0010, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0011, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0021, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0021, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0127, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0110, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0013, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0013, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0009, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0080, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0019, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0088, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0012, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0012, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0018, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0087, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0108, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0017, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0095, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0022, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0099, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0015, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0018, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0019, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0123, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0087, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0098, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0022, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0016, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0020, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0103, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0087, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0024, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0082, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0018, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0084, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0020, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0086, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0125, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0086, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0014, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0094, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0082, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0115, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0080, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0023, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0022, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0004, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0097, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0087, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0022, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0011, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0088, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0019, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0081, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0106, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0118, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0011, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0100, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0102, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0085, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0125, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0085, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0086, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0084, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0023, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0086, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0100, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0105, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0021, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0017, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0020, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0092, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0016, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0017, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0102, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0089, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0081, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0080, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0088, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0089, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0100, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0093, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0021, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0024, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0084, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0016, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0010, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0136, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0008, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0113, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0021, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0024, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0009, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0103, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0018, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0093, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0016, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0083, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0092, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0080, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0100, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0015, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0099, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0021, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0024, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0097, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0090, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0018, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0153, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0081, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0126, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0095, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0020, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0083, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0137, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0080, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0024, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0019, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0020, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0097, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0104, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0088, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0104, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0088, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0082, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0022, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0097, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0023, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0020, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0119, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0010, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0017, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0020, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0017, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0022, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0018, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0013, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0019, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0007, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0008, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0149, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0097, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0184, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0018, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0022, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0016, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0009, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0018, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0023, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0015, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0127, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0019, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0122, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0017, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0017, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0024, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0016, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0023, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0016, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0014, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0113, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0018, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0105, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0022, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0020, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0085, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0022, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0017, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0083, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0015, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0088, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0023, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0098, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0019, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0090, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0018, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0022, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0011, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0018, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0102, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0020, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0015, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0106, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0080, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0102, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0019, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0092, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0084, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0011, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0017, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0107, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0017, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0080, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0096, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0110, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0021, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0020, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0016, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0024, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0124, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0138, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0081, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0011, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0022, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0012, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0137, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0023, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0021, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0011, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0085, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0015, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0087, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0086, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0166, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0105, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0024, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0023, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0094, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0024, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0017, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0094, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0104, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0107, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0094, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0015, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0019, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0024, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0098, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0107, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0082, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0018, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0022, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0083, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0085, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0015, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0095, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0020, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0121, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0024, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0095, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0022, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0103, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0080, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0023, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0023, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0128, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0009, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0120, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0094, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0082, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0018, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0080, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0018, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0021, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0007, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0164, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0093, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0018, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0113, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0097, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0022, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0020, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0020, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0022, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0082, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0086, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0015, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0018, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0140, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0094, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0113, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0085, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0024, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0024, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0016, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0137, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0088, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0017, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0023, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0087, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0019, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0024, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0018, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0017, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0081, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0085, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0093, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0104, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0108, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0109, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0082, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0130, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0088, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0022, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0021, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0081, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0087, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0082, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0091, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0015, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0100, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0135, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0087, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0020, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0022, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0082, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0133, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0016, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0080, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0017, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0092, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0024, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0016, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0085, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0091, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0093, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0024, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0023, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0120, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0094, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0021, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0018, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0024, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0103, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0088, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0087, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0016, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0123, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0012, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0023, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0011, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0155, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0104, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0023, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0108, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0018, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0092, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0018, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0006, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0080, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0010, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0016, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0004, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0023, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0017, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0115, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0021, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0101, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0083, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0087, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0102, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0023, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0024, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0089, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0093, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0023, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0098, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0082, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0081, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0082, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0097, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0098, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0020, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0017, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0090, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0082, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0118, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0096, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0105, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0021, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0088, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0090, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0117, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0084, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0022, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0021, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0104, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0086, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0091, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0080, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0020, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0097, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0094, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0020, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0111, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0104, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0118, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0117, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0016, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0022, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0100, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0084, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0081, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0084, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0024, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0022, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0097, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0022, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0103, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0019, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0024, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0016, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0012, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0024, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0106, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0016, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0019, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0082, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0116, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0093, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0085, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0011, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0014, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0023, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0092, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0092, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0083, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0020, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0011, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0022, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0087, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0118, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0088, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0084, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0091, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0015, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0023, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0021, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0021, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0015, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0082, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0017, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0020, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0081, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0008, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0117, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0091, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0085, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0015, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0108, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0011, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0017, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0016, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0086, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0102, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0080, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0086, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0017, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0080, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0014, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0024, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0100, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0024, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0016, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0097, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0023, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0011, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0024, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0091, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0118, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0091, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0021, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0022, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0086, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0022, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0102, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0008, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0023, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0086, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0102, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0023, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0127, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0092, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0021, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0084, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0086, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0093, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0140, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0096, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0105, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0111, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0082, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0016, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0021, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0128, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0135, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0125, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0007, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0020, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0093, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0086, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0094, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0022, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0013, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0017, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0013, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0016, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0015, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0099, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0016, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0015, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0023, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0011, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0014, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0088, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0099, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0082, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0086, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0024, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0015, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0090, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0088, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0023, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0023, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0019, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0014, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0130, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0015, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0023, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0011, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0120, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0013, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0116, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0085, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0098, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0118, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0083, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0008, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0017, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0081, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0015, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0083, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0111, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0014, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0010, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0097, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0150, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0092, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0090, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0106, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0019, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0008, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0021, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0097, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0023, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0023, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0011, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0006, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0085, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0017, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0080, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0084, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0112, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0095, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0018, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0094, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0024, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0015, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0005, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0014, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0018, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0016, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0022, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0016, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0022, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0084, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0020, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0089, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0095, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0098, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0024, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0091, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0092, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0023, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0105, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0102, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0083, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0136, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0010, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0137, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0024, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0021, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0014, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0094, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0163, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0101, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0133, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0086, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0024, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0024, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0086, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0018, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0083, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0016, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0023, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0082, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0006, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0015, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0019, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0011, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0082, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0120, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0014, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0019, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0021, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0087, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0111, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0082, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0102, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0105, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0080, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0021, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0084, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0119, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0087, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0012, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0082, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0086, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0019, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0082, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0082, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0135, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0019, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0024, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0021, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0098, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0019, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0020, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0007, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0013, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0100, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0081, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0085, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0087, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0020, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0089, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0111, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0010, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0083, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0126, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0083, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0136, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0150, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0081, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0022, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0021, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0017, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0116, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0089, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0096, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0104, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0129, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0023, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0084, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0024, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0081, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0082, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0017, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0021, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0088, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0085, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0023, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0020, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0017, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0110, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0014, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0101, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0080, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0088, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0022, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0022, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0022, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0110, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0118, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0113, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0127, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0080, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0010, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0012, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0088, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0105, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0013, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0095, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0019, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0093, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0100, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0095, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0022, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0095, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0023, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0082, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0021, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0090, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0099, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0024, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0083, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0173, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0088, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0107, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0083, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0019, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0097, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0018, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0092, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0116, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0008, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0011, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0114, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0081, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0084, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0101, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0022, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0096, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0083, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0023, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0082, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0022, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0103, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0128, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0086, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0019, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0022, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0081, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0113, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0114, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0016, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0087, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0112, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0089, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0015, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0008, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0024, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0092, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0119, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0108, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0012, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0085, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0021, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0086, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0089, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0089, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0086, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0083, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0135, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0081, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0085, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0023, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0024, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0080, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0021, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0081, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0185, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0092, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0021, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0021, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0022, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0109, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0095, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0017, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0015, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0024, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0086, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0008, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0108, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0098, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0086, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0116, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0082, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0085, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0131, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0125, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0080, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0102, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0154, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0011, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0008, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0013, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0009, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0023, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0016, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0108, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0024, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0101, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0092, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0018, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0023, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0013, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0082, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0089, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0083, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0082, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0021, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0020, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0018, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0021, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0019, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0099, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0017, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0024, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0021, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0087, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0111, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0013, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0015, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0105, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0089, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0014, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0004, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0091, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0009, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0137, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0010, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0118, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0088, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0103, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0084, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0024, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0022, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0082, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0108, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0012, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0013, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0153, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0020, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0013, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0090, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0086, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0083, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0082, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0085, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0018, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0106, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0016, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0091, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0104, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0080, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0023, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0017, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0100, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0112, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0016, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0103, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0017, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0080, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0007, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0016, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0023, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0111, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0023, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0087, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0092, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0117, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0015, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0023, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0103, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0105, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0013, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0021, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0115, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0090, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0085, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0088, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0020, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0123, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0110, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0100, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0085, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0019, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0081, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0150, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0092, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0023, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0092, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0015, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0004, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0093, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0090, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0086, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0091, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0108, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0087, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0013, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0123, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0017, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0023, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0012, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0106, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0081, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0010, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0006, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0087, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0081, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0017, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0084, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0080, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0016, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0115, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0018, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0105, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0136, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0016, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0081, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0012, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0082, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0083, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0125, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0008, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0014, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0090, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0103, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0123, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0015, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0118, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0011, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0018, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0091, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0024, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0024, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0004, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0087, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0097, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0103, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0084, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0080, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0024, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0104, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0009, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0094, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0016, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0095, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0016, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0021, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0084, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0081, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0100, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0121, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0020, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0087, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0080, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0139, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0124, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0108, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0094, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0024, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0015, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0103, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0119, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0082, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0123, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0152, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0023, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0023, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0023, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0083, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0089, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0010, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0099, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0093, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0024, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0100, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0011, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0085, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0087, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0021, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0080, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0012, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0022, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0023, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0080, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0021, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0024, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0080, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0098, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0084, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0018, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0108, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0019, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0080, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0088, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0095, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0110, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0121, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0018, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0014, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0085, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0116, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0091, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0085, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0080, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0096, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0090, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0127, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0018, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0023, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0021, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0082, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0083, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0009, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0024, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0085, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0010, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0017, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0008, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0018, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0097, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0020, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0083, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0081, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0083, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0086, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0013, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0024, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0129, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0023, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0085, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0087, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0023, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0092, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0087, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0021, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0023, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0021, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0023, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0118, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0013, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0018, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0087, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0022, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0015, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0019, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0088, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0082, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0086, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0081, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0011, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0022, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0023, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0024, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0010, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0121, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0081, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0161, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0120, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0088, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0021, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0080, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0024, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0021, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0098, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0020, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0122, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0019, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0110, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0097, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0023, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0022, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0015, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0016, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0121, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0022, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0010, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0017, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0024, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0085, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0080, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0014, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0089, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0082, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0017, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0089, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0024, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0023, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0005, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0081, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0024, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0019, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0023, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0091, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0017, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0088, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0119, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0022, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0139, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0020, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0096, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0024, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0081, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0022, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0020, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0022, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0116, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0020, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0023, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0087, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0012, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0085, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0088, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0081, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0098, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0017, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0010, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0017, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0098, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0088, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0015, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0091, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0013, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0016, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0014, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0087, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0097, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0093, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0082, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0006, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0105, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0087, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0015, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0016, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0019, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0023, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0018, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0022, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0099, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0107, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0082, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0018, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0017, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0017, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0133, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0023, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0093, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0016, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0022, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0012, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0021, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0150, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0013, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0091, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0021, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0022, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0091, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0114, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0016, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0080, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0101, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0083, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0081, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0021, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0015, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0089, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0096, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0022, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0022, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0105, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0023, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0019, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0146, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0116, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0020, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0018, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0124, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0118, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0021, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0015, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0017, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0020, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0086, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0088, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0091, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0104, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0011, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0082, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0017, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0012, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0096, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0097, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0009, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0085, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0024, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0020, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0012, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0023, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0011, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0019, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0014, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0017, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0093, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0011, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0022, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0024, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0098, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0093, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0018, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0108, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0081, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0015, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0021, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0096, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0080, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0014, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0139, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0091, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0023, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0014, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0089, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0024, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0015, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0107, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0019, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0090, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0019, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0015, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0023, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0105, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0018, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0018, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0084, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0017, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0091, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0021, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0009, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0023, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0011, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0012, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0081, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0017, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0095, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0014, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0087, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0020, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0021, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0020, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0096, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0152, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0024, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0080, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0011, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0020, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0021, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0022, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0094, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0022, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0081, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0140, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0101, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0021, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0015, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0089, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0016, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0019, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0015, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0023, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0088, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0081, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0011, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0147, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0092, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0019, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0023, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0018, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0021, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0133, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0022, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0013, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0016, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0019, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0022, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0109, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0100, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0017, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0081, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0087, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0116, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0020, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0020, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0022, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0024, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0010, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0107, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0005, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0021, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0016, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0021, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0024, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0085, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0097, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0017, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0082, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0088, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0015, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0084, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0096, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0081, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0083, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0022, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0083, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0004, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0083, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0010, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0096, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0005, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0117, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0024, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0023, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0014, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0017, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0012, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0013, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0016, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0021, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0095, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0080, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0013, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0016, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0087, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0090, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0083, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0083, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0021, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0085, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0091, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0024, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0020, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0116, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0093, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0023, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0095, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0126, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0020, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0019, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0081, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0014, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0093, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0102, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0019, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0088, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0091, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0108, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0024, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0004, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0020, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0091, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0016, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0110, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0083, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0013, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0083, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0088, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0023, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0010, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0011, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0081, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0087, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0120, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0080, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0086, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0017, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0018, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0010, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0022, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0024, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0020, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0087, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0015, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0009, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0021, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0105, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0022, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0006, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0130, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0096, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0081, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0010, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0017, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0087, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0014, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0017, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0017, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0023, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0085, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0083, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0098, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0021, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0014, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0023, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0117, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0024, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0083, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0018, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0010, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0083, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0012, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0020, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0023, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0024, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0094, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0019, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0023, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0016, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0019, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0127, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0120, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0017, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0082, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0012, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0014, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0009, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0082, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0008, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0015, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0089, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0100, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0017, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0022, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0019, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0002, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0001, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0001, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0001, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0001, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0001, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0001, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0001, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0001, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0001, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0001, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0001, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0001, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0001, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0001, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0001, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0001, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0001, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0004, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0021, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0099, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0015, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0098, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0009, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0023, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0022, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0084, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0023, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0023, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0024, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0082, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0081, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0024, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0018, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0021, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0020, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0085, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0018, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0083, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0085, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0022, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0022, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0016, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0085, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0024, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0014, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0007, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0014, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0097, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0021, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0024, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0080, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0017, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0080, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0016, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0080, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0085, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0015, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0018, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0084, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0016, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0024, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0022, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0083, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0017, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0097, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0091, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0024, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0009, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0024, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0100, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0090, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0109, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0014, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0013, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0020, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0015, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0095, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0084, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0021, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0014, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0008, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0144, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0112, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0133, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0024, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0013, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0097, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0015, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0019, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0129, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0087, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0016, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0004, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0022, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0015, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0163, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0111, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0009, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0016, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0023, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0019, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0110, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0011, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0103, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0082, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0019, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0018, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0024, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0024, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0090, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0020, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0016, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0091, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0020, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0011, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0017, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0019, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0018, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0024, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0020, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0014, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0091, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0022, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0024, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0015, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0086, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0018, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0014, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0018, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0019, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0090, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0016, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0017, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0015, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0021, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0023, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0091, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0021, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0017, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0100, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0020, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0012, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0018, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0022, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0086, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0018, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0013, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0017, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0125, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0015, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0018, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0022, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0020, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0091, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0024, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0006, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0008, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0014, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0015, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0013, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0022, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0023, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0085, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0016, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0023, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0104, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0007, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0021, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0082, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0021, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0013, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0087, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0016, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0014, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0135, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0011, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0081, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0020, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0165, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0020, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0023, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0083, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0015, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0007, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0095, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0080, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0023, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0019, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0147, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0005, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0024, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0093, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0152, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0021, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0081, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0024, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0022, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0088, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0022, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0019, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0012, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0013, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0083, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0083, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0093, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0023, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0012, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0085, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0080, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0015, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0088, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0004, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0024, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0011, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0008, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0087, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0023, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0021, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0015, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0021, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0017, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0024, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0019, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0016, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0015, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0133, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0014, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0021, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0021, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0090, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0019, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0087, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0023, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0024, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0013, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0015, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0097, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0006, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0016, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0109, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0022, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0017, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0024, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0018, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0015, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0085, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0024, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0006, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0018, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0018, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0015, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0097, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0014, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0024, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0085, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0096, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0013, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0022, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0021, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0023, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0022, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0128, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0007, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0018, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0008, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0019, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0024, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0088, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0018, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0084, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0014, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0019, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0020, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0024, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0017, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0009, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0080, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0014, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0093, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0021, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0008, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0088, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0019, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0020, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0019, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0021, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0021, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0023, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0006, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0019, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0011, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0022, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0024, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0011, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0145, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0012, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0005, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0016, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0115, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0017, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0023, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0086, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0008, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0017, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0019, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0005, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0024, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0013, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0023, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0019, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0015, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0022, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0081, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0018, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0018, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0092, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0009, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0018, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0020, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0091, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0021, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0085, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0023, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0022, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0082, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0017, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0017, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0100, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0098, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0018, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0086, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0097, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0016, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0131, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0019, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0024, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0010, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0007, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0011, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0088, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0093, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0020, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0094, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0083, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0022, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0086, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0081, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0007, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0015, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0086, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0015, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0081, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0082, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0022, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0019, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0010, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0023, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0018, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0021, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0014, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0018, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0023, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0110, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0024, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0017, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0024, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0018, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0018, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0018, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0022, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0082, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0015, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0090, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0112, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0024, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0020, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0017, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0132, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0019, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0024, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0019, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0021, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0102, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0023, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0082, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0017, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0015, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0020, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0024, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0096, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0082, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0021, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0023, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0011, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0081, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0017, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0088, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0160, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0101, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0092, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0015, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0013, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0017, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0015, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0017, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0017, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0022, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0123, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0008, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0021, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0106, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0102, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0013, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0015, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0087, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0082, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0023, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0087, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0022, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0021, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0084, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0021, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0088, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0014, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0016, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0088, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0021, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0081, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0083, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0014, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0016, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0080, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0019, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0018, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0081, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0018, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0008, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0020, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0105, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0116, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0022, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0014, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0019, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0022, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0095, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0110, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0081, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0015, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0023, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0021, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0102, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0117, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0099, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0091, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0021, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0021, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0024, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0013, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0084, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0110, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0018, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0013, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0011, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0024, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0022, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0023, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0021, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0149, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0018, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0014, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0089, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0020, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0018, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0023, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0021, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0022, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0097, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0012, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0098, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0120, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0016, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0007, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0014, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0020, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0166, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0020, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0094, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0024, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0104, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0020, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0091, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0082, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0018, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0164, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0082, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0020, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0089, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0091, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0015, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0007, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0021, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0012, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0016, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0018, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0096, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0010, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0108, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0019, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0101, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0093, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0023, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0020, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0014, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0106, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0019, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0024, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0022, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0023, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0090, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0009, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0089, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0012, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0008, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0024, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0019, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0092, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0020, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0024, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0088, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0016, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0022, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0095, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0020, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0080, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0091, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0023, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0105, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0111, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0114, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0020, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0022, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0024, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0021, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0101, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0084, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0128, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0106, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0096, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0102, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0018, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0010, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0014, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0006, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0083, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0097, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0008, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0023, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0158, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0012, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0102, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0096, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0024, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0131, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0017, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0021, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0145, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0022, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0014, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0121, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0015, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0017, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0023, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0024, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0086, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0002, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0009, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0097, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0019, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0094, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0024, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0013, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0015, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0015, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0023, device='cuda:0', grad_fn=<NllLossBackward>)
Saving model at iteration: 20000
Loss: tensor(0.0087, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0100, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0018, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0130, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0010, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0015, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0088, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0021, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0091, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0024, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0097, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0015, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0080, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0015, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0022, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0080, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0022, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0014, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0098, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0020, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0016, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0023, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0011, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0087, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0008, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0009, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0088, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0013, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0014, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0018, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0020, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0083, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0022, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0021, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0018, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0088, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0102, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0016, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0017, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0017, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0085, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0010, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0020, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0015, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0092, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0012, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0021, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0017, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0010, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0084, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0106, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0132, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0096, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0016, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0021, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0020, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0107, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0090, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0005, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0100, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0018, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0010, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0024, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0012, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0020, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0093, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0010, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0022, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0023, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0024, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0024, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0024, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0082, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0082, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0081, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0082, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0022, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0118, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0116, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0024, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0021, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0085, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0083, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0024, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0082, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0024, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0141, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0023, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0083, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0018, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0019, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0098, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0016, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0134, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0024, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0014, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0020, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0018, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0082, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0102, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0080, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0020, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0112, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0024, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0082, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0093, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0017, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0119, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0021, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0129, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0122, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0084, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0022, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0097, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0099, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0107, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0020, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0115, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0106, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0018, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0021, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0015, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0084, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0016, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0010, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0094, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0024, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0104, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0113, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0196, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0011, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0023, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0016, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0082, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0125, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0082, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0021, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0086, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0023, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0084, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0022, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0081, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0021, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0083, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0010, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0022, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0014, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0084, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0019, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0022, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0024, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0012, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0017, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0022, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0007, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0097, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0020, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0016, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0080, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0024, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0008, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0101, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0096, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0080, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0087, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0085, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0012, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0018, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0021, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0015, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0085, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0099, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0082, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0013, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0083, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0019, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0016, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0013, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0090, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0108, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0021, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0020, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0093, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0013, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0091, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0134, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0014, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0015, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0098, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0009, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0122, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0096, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0015, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0021, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0010, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0023, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0094, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0082, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0009, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0024, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0014, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0019, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0080, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0087, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0018, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0023, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0097, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0015, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0016, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0022, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0091, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0086, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0115, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0023, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0024, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0013, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0021, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0013, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0019, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0086, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0021, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0087, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0099, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0096, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0081, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0012, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0023, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0081, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0016, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0012, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0011, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0084, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0100, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0015, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0018, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0088, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0080, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0013, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0020, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0022, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0080, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0022, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0085, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0095, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0013, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0020, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0080, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0020, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0021, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0016, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0018, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0091, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0014, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0124, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0103, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0086, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0117, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0017, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0087, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0081, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0006, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0024, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0015, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0017, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0086, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0019, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0019, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0104, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0080, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0091, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0016, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0098, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0012, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0085, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0015, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0011, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0008, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0083, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0019, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0081, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0023, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0023, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0016, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0134, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0110, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0018, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0013, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0008, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0021, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0009, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0017, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0021, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0097, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0022, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0018, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0024, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0005, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0015, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0024, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0113, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0015, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0024, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0012, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0017, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0095, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0158, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0020, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0024, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0006, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0085, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0134, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0018, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0020, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0024, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0022, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0081, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0122, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0009, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0010, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0017, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0120, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0105, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0012, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0018, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0020, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0019, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0148, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0016, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0119, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0080, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0024, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0097, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0016, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0014, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0009, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0018, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0017, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0023, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0093, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0086, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0107, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0017, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0092, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0108, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0113, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0009, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0020, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0081, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0092, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0104, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0092, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0097, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0090, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0004, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0023, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0021, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0022, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0010, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0019, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0024, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0007, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0019, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0004, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0126, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0121, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0018, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0023, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0020, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0090, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0109, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0018, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0088, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0015, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0014, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0083, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0020, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0008, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0020, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0090, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0018, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0021, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0023, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0022, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0024, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0108, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0081, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0020, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0016, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0015, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0014, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0016, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0087, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0017, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0019, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0089, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0112, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0020, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0020, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0022, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0024, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0103, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0010, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0095, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0018, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0085, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0089, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0120, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0085, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0081, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0005, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0089, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0015, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0086, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0102, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0082, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0101, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0019, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0017, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0099, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0098, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0014, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0024, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0104, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0018, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0019, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0098, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0080, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0021, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0019, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0022, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0090, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0020, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0134, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0011, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0013, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0002, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0021, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0007, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0119, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0013, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0023, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0112, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0022, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0016, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0086, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0018, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0008, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0020, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0017, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0022, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0020, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0013, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0084, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0084, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0008, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0024, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0013, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0018, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0008, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0111, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0021, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0024, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0021, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0085, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0024, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0008, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0014, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0024, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0095, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0020, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0021, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0014, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0125, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0018, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0023, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0020, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0009, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0014, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0014, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0022, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0096, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0018, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0011, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0023, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0015, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0018, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0004, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0017, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0117, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0022, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0010, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0082, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0018, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0021, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0080, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0018, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0024, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0020, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0080, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0021, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0022, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0024, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0022, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0022, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0080, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0004, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0019, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0081, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0021, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0018, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0006, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0080, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0106, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0088, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0091, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0010, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0017, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0016, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0024, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0102, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0020, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0094, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0021, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0087, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0011, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0088, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0090, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0024, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0022, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0015, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0016, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0119, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0017, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0015, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0012, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0020, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0113, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0082, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0023, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0023, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0110, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0017, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0019, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0019, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0022, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0081, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0084, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0020, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0021, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0024, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0111, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0021, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0094, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0009, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0020, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0015, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0014, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0023, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0023, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0012, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0008, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0097, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0095, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0012, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0136, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0018, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0020, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0017, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0089, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0098, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0092, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0007, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0009, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0019, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0016, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0093, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0013, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0015, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0018, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0080, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0086, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0014, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0088, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0023, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0017, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0015, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0017, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0022, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0004, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0003, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0024, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0087, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0093, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0082, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0023, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0019, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0083, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0101, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0103, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0017, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0098, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0016, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0095, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0016, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0155, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0108, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0090, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0081, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0085, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0113, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0082, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0096, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0143, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0021, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0018, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0089, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0134, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0018, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0022, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0104, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0118, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0083, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0154, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0171, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0101, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0083, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0015, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0021, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0014, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0022, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0020, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0014, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0011, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0021, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0081, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0081, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0024, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0114, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0016, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0084, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0099, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0090, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0111, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0081, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0016, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0094, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0019, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0022, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0110, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0011, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0019, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0095, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0105, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0098, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0023, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0088, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0013, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0016, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0112, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0087, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0024, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0023, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0013, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0024, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0111, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0021, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0097, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0092, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0011, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0024, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0015, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0115, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0091, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0024, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0087, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0082, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0022, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0020, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0083, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0017, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0024, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0080, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0080, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0022, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0135, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0019, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0090, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0024, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0083, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0023, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0091, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0095, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0021, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0023, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0019, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0017, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0089, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0170, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0092, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0022, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0113, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0085, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0014, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0083, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0112, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0013, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0103, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0086, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0111, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0015, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0091, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0011, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0015, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0021, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0090, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0022, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0023, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0009, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0121, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0023, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0016, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0022, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0022, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0080, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0019, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0018, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0020, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0021, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0009, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0084, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0099, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0017, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0097, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0083, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0024, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0011, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0018, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0088, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0020, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0018, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0092, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0096, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0094, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0108, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0022, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0023, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0017, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0085, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0105, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0095, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0021, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0092, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0023, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0098, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0021, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0106, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0016, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0093, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0022, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0020, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0014, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0083, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0111, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0095, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0024, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0023, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0023, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0107, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0024, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0016, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0018, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0024, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0092, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0091, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0017, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0086, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0136, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0014, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0121, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0018, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0017, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0097, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0104, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0015, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0112, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0011, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0082, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0016, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0017, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0019, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0081, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0015, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0101, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0024, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0022, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0022, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0121, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0118, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0112, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0088, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0088, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0020, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0018, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0100, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0085, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0023, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0019, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0094, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0101, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0083, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0010, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0020, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0018, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0013, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0081, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0087, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0017, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0080, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0020, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0021, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0121, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0022, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0109, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0082, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0017, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0020, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0022, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0143, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0011, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0139, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0089, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0120, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0015, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0081, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0019, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0085, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0106, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0110, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0023, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0109, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0098, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0093, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0023, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0019, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0099, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0017, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0015, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0095, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0106, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0084, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0012, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0083, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0098, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0096, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0087, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0019, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0082, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0086, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0092, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0016, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0020, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0090, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0081, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0113, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0017, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0084, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0023, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0012, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0019, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0023, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0102, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0116, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0015, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0024, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0021, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0089, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0112, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0092, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0084, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0093, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0126, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0080, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0019, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0127, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0020, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0090, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0091, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0021, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0085, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0010, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0021, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0103, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0101, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0021, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0018, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0020, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0082, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0022, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0086, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0098, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0022, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0020, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0016, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0101, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0086, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0011, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0102, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0144, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0098, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0082, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0085, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0023, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0022, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0094, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0013, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0107, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0019, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0105, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0023, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0014, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0017, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0088, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0090, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0014, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0011, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0096, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0006, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0132, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0024, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0016, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0115, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0125, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0096, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0096, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0112, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0022, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0114, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0085, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0095, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0082, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0018, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0018, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0172, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0094, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0090, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0014, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0016, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0086, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0089, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0081, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0014, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0142, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0022, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0010, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0082, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0092, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0098, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0089, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0083, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0093, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0103, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0084, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0092, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0019, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0009, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0090, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0018, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0081, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0106, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0082, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0093, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0023, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0100, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0108, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0096, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0023, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0094, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0084, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0022, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0024, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0084, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0017, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0102, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0008, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0091, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0024, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0008, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0085, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0109, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0118, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0020, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0101, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0095, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0024, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0089, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0081, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0129, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0083, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0098, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0115, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0081, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0012, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0081, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0094, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0106, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0092, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0115, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0018, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0083, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0091, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0085, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0019, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0019, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0019, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0093, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0149, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0099, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0014, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0016, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0113, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0016, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0168, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0101, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0014, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0090, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0020, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0016, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0024, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0112, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0024, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0101, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0088, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0101, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0131, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0083, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0092, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0080, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0019, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0024, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0082, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0115, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0020, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0093, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0139, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0012, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0095, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0131, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0088, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0113, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0097, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0021, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0095, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0098, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0084, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0100, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0080, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0105, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0081, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0131, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0081, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0102, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0081, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0021, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0024, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0020, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0014, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0105, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0081, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0010, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0101, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0086, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0121, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0015, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0092, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0108, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0022, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0121, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0024, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0111, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0087, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0083, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0093, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0021, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0008, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0092, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0023, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0023, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0096, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0017, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0086, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0091, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0022, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0086, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0082, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0088, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0017, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0012, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0021, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0023, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0082, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0089, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0082, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0111, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0091, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0090, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0081, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0088, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0014, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0091, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0119, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0082, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0083, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0086, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0019, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0080, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0092, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0102, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0019, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0082, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0093, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0015, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0102, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0099, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0082, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0013, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0103, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0096, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0111, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0095, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0024, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0102, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0087, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0083, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0083, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0021, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0126, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0012, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0015, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0118, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0093, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0092, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0083, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0132, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0010, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0081, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0092, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0104, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0082, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0087, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0084, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0124, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0098, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0022, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0010, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0019, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0098, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0085, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0082, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0100, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0090, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0114, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0102, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0123, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0104, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0098, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0106, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0105, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0129, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0129, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0098, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0083, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0019, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0090, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0098, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0096, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0088, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0100, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0084, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0089, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0085, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0151, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0098, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0016, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0115, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0101, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0019, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0108, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0084, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0020, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0023, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0093, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0085, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0084, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0019, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0024, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0097, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0023, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0021, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0022, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0018, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0128, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0023, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0021, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0108, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0019, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0020, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0092, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0012, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0019, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0135, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0094, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0012, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0017, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0013, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0107, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0088, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0008, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0083, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0113, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0016, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0090, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0021, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0018, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0107, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0021, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0082, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0080, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0086, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0009, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0097, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0024, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0097, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0109, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0096, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0082, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0021, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0091, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0096, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0023, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0023, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0021, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0018, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0009, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0085, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0091, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0020, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0088, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0013, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0093, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0021, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0017, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0082, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0082, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0022, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0098, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0011, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0020, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0091, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0018, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0013, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0023, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0023, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0020, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0024, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0024, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0020, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0018, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0009, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0090, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0015, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0095, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0016, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0023, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0009, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0022, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0021, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0099, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0012, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0015, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0020, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0009, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0015, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0018, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0086, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0023, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0022, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0020, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0017, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0023, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0005, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0013, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0098, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0020, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0016, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0024, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0024, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0015, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0021, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0017, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0015, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0016, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0023, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0104, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0023, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0022, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0017, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0021, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0015, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0020, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0020, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0086, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0017, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0082, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0087, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0024, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0015, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0020, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0023, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0022, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0015, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0010, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0023, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0019, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0019, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0010, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0018, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0104, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0017, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0018, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0022, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0008, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0020, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0020, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0011, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0017, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0015, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0024, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0017, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0013, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0023, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0013, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0004, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0019, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0020, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0012, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0024, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0096, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0116, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0095, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0018, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0019, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0014, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0020, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0024, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0015, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0014, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0083, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0009, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0081, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0019, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0013, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0022, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0008, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0019, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0021, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0014, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0023, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0016, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0096, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0023, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0020, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0023, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0089, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0015, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0020, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0112, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0014, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0092, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0016, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0018, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0020, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0017, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0086, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0098, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0022, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0023, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0021, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0015, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0092, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0021, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0110, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0014, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0084, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0019, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0095, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0017, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0012, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0023, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0109, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0024, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0090, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0086, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0018, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0022, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0011, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0013, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0017, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0024, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0018, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0023, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0013, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0021, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0017, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0014, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0016, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0014, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0007, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0024, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0108, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0024, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0017, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0007, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0014, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0086, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0022, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0019, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0089, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0116, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0022, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0081, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0021, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0017, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0020, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0023, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0017, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0014, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0023, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0024, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0014, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0081, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0022, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0005, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0023, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0022, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0084, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0018, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0086, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0089, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0023, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0021, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0013, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0022, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0024, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0114, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0019, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0024, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0016, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0013, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0022, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0023, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0024, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0017, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0011, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0093, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0083, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0022, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0022, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0015, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0094, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0010, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0108, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0016, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0023, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0022, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0090, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0023, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0023, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0015, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0019, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0024, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0093, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0096, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0017, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0095, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0023, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0010, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0016, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0018, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0021, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0120, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0009, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0023, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0084, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0024, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0116, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0080, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0010, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0024, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0017, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0012, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0014, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0097, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0021, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0083, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0104, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0015, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0017, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0104, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0081, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0101, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0017, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0007, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0088, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0019, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0084, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0019, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0015, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0017, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0008, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0013, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0005, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0014, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0021, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0131, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0019, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0082, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0023, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0020, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0024, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0017, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0024, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0018, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0106, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0110, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0013, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0017, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0086, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0013, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0019, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0016, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0024, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0019, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0018, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0086, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0081, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0092, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0086, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0084, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0013, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0019, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0099, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0015, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0024, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0006, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0024, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0081, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0090, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0017, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0022, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0082, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0015, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0111, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0021, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0018, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0019, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0019, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0022, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0086, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0015, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0091, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0022, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0022, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0019, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0012, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0085, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0092, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0115, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0020, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0080, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0012, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0111, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0019, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0093, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0108, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0019, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0023, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0108, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0021, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0020, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0015, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0024, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0096, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0023, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0023, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0021, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0090, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0082, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0022, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0022, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0088, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0081, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0024, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0021, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0018, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0131, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0092, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0023, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0086, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0022, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0016, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0018, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0021, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0023, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0090, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0016, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0023, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0016, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0022, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0019, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0088, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0020, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0086, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0024, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0018, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0084, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0016, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0021, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0018, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0083, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0017, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0021, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0088, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0019, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0017, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
--------------Restarting: Beginning EPOCH 9-------------
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Test Accuracy: 0.961073182417056
Loss: tensor(0.0101, device='cuda:0', grad_fn=<NllLossBackward>)
Saving model at iteration: 0
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0109, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0092, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0149, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0117, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0173, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0080, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0111, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0010, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0125, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0081, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0097, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0086, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0149, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0126, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0158, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0116, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0093, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0090, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0101, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0113, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0126, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0095, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0202, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0087, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0135, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0124, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0157, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0095, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0137, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0024, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0200, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0093, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0161, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0151, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0011, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0007, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0012, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0095, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0004, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0175, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0081, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0094, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0007, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0106, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0017, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0110, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0141, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0081, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0023, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0002, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0013, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0018, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0092, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0002, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0002, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0093, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0196, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0102, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0085, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0013, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0144, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0108, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0002, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0120, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0088, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0104, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0002, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0013, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0114, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0004, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0130, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0097, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0087, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0012, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0002, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0002, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0004, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0092, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0088, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0087, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0004, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0004, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0083, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0095, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0002, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0004, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0002, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0003, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0014, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0002, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0111, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0133, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0083, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0124, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0003, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0086, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0009, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0002, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0002, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0118, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0142, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0009, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0003, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0001, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0001, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0001, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0001, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0125, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0110, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0099, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0012, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0019, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0001, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0104, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0003, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0080, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0001, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0150, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0001, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0001, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0003, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0003, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0102, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0149, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0139, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0090, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0013, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0088, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0180, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0179, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0150, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0106, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0018, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0005, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0001, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0222, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0093, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0001, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0016, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0139, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0185, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0094, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0011, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0127, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0131, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0014, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0111, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0116, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0088, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0091, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0113, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0090, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0104, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0104, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0109, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0133, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0086, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0122, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0181, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0022, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0153, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0089, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0171, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0008, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0023, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0128, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0103, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0100, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0132, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0086, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0119, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0010, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0183, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0006, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0105, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0095, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0105, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0155, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0011, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0116, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0086, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0085, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0085, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0149, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0125, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0106, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0096, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0087, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0018, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0151, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0014, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0097, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0013, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0092, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0150, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0082, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0118, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0180, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0159, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0153, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0141, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0092, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0140, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0013, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0122, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0019, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0085, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0095, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0110, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0024, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0148, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0130, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0101, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0105, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0162, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0154, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0023, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0119, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0018, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0020, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0007, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0020, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0020, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0085, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0020, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0094, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0114, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0133, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0023, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0080, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0128, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0018, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0139, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0106, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0013, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0178, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0162, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0100, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0094, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0102, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0167, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0107, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0152, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0082, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0020, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0095, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0096, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0089, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0104, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0123, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0099, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0018, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0203, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0102, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0186, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0018, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0172, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0082, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0080, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0011, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0096, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0090, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0015, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0004, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0019, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0013, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0022, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0105, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0016, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0021, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0094, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0105, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0131, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0018, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0138, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0099, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0017, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0021, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0007, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0130, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0097, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0091, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0103, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0091, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0111, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0080, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0112, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0091, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0006, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0080, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0125, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0133, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0111, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0090, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0083, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0002, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0085, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0006, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0099, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0015, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0099, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0128, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0136, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0007, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0008, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0099, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0121, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0126, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0014, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0016, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0011, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0136, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0157, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0094, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0156, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0080, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0107, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0019, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0018, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0003, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0135, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0126, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0103, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0084, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0007, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0014, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0113, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0091, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0002, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0088, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0019, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0080, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0094, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0008, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0085, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0141, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0099, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0087, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0008, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0118, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0088, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0095, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0167, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0081, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0119, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0105, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0093, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0011, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0094, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0125, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0130, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0022, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0013, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0094, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0095, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0012, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0178, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0005, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0083, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0021, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0020, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0007, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0007, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0097, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0095, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0022, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0022, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0086, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0094, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0011, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0117, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0096, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0009, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0145, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0108, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0021, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0105, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0086, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0108, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0018, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0128, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0011, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0090, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0117, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0082, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0130, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0095, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0010, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0169, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0022, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0018, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0179, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0122, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0087, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0087, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0202, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0103, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0094, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0113, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0137, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0100, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0023, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0124, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0133, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0163, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0109, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0126, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0188, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0092, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0017, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0023, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0017, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0140, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0185, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0123, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0085, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0017, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0135, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0005, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0020, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0114, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0013, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0093, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0019, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0080, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0139, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0011, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0169, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0018, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0021, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0016, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0153, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0022, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0086, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0010, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0094, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0012, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0008, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0004, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0085, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0017, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0100, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0082, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0084, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0011, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0105, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0235, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0133, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0082, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0023, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0010, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0014, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0120, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0128, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0003, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0100, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0106, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0012, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0130, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0096, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0002, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0024, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0104, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0183, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0094, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0103, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0019, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0024, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0108, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0015, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0005, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0107, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0106, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0081, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0109, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0084, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0091, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0121, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0136, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0024, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0087, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0081, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0024, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0010, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0092, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0100, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0087, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0112, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0131, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0122, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0097, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0020, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0015, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0017, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0130, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0117, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0111, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0116, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0084, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0084, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0155, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0107, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0081, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0002, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0147, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0094, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0111, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0089, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0104, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0092, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0008, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0110, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0167, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0021, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0148, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0024, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0126, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0093, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0014, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0083, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0140, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0085, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0145, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0138, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0122, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0108, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0101, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0108, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0083, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0113, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0153, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0087, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0188, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0081, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0085, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0012, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0104, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0129, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0021, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0127, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0093, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0004, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0108, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0098, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0146, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0097, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0126, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0090, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0092, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0023, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0122, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0106, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0129, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0023, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0148, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0018, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0129, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0084, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0119, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0094, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0145, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0023, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0117, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0138, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0119, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0015, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0099, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0017, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0085, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0015, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0131, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0014, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0018, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0135, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0114, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0103, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0086, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0124, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0172, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0111, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0128, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0007, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0097, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0113, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0109, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0018, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0117, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0111, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0008, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0186, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0007, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0131, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0012, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0166, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0114, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0006, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0009, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0083, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0113, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0020, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0017, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0015, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0089, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0106, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0019, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0119, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0160, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0089, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0017, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0116, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0082, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0004, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0005, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0130, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0090, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0084, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0117, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0010, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0129, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0116, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0119, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0011, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0007, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0018, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0021, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0102, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0109, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0081, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0117, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0133, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0095, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0141, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0088, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0018, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0132, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0009, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0134, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0324, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0101, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0017, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0104, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0008, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0086, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0083, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0001, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0104, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0083, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0082, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0110, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0017, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0129, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0114, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0001, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0016, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0121, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0092, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0004, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0084, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0021, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0127, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0092, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0012, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0118, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0015, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0007, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0097, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0087, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0088, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0183, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0115, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0151, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0145, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0103, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0008, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0109, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0120, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0012, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0119, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0094, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0084, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0096, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0098, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0107, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0200, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0086, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0090, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0147, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0113, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0108, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0124, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0126, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0023, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0107, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0110, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0089, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0113, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0094, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0009, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0007, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0017, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0080, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0124, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0148, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0006, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0014, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0086, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0015, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0147, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0006, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0095, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0084, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0113, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0102, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0090, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0005, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0009, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0080, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0108, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0098, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0098, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0123, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0019, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0012, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0090, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0148, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0021, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0112, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0085, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0092, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0110, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0116, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0017, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0104, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0132, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0019, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0017, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0013, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0154, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0003, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0089, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0014, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0089, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0100, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0007, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0008, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0020, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0018, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0095, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0115, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0084, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0024, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0146, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0134, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0012, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0131, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0001, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0081, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0098, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0162, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0104, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0123, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0018, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0011, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0121, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0153, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0101, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0022, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0108, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0004, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0082, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0017, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0094, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0088, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0125, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0013, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0021, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0010, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0097, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0013, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0003, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0016, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0009, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0001, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0019, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0086, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0132, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0105, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0138, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0024, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0097, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0113, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0010, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0090, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0014, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0082, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0010, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0020, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0010, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0139, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0021, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0084, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0082, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0014, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0098, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0088, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0002, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0107, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0017, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0140, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0104, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0015, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0127, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0083, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0023, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0126, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0012, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0141, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0175, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0105, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0082, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0006, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0020, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0015, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0092, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0006, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0095, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0110, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0011, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0013, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0098, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0097, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0096, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0089, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0092, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0112, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0010, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0156, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0015, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0080, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0113, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0121, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0116, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0016, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0110, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0089, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0093, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0009, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0020, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0102, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0084, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0218, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0011, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0124, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0176, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0021, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0009, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0090, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0009, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0014, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(7.2945e-05, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0124, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0019, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0118, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0125, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0082, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0110, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0016, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0013, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0001, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(7.5683e-05, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0006, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0128, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0097, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0097, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0155, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0007, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0093, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0088, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0006, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0107, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0094, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0084, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0017, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0095, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0015, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0089, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0003, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0103, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0087, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0087, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0019, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0085, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0115, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0147, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0005, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0021, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0020, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0149, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0139, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0125, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0020, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0093, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0085, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0084, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0019, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0100, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0016, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0024, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0098, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0014, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0094, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0101, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0022, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0087, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0024, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0020, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0097, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0018, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0021, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0018, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0020, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0128, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0085, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0095, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0146, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0129, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0117, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0100, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0097, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0125, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0166, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0004, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0023, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0130, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0006, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0124, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0117, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0019, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0219, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0002, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0128, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0009, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0014, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0020, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0101, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0093, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0159, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0012, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0011, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0100, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0081, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0015, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0105, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0024, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0103, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0194, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0152, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0213, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0124, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0093, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0009, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0081, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0011, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0024, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0085, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0128, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0021, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0083, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0109, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0086, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0006, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0103, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0007, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0083, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0165, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0018, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0018, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0122, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0080, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0023, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0091, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0019, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0094, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0018, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0024, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0133, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0086, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0114, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0082, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0104, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0082, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0015, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0024, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0108, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0148, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0120, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0010, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0018, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0020, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0019, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0082, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0012, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0091, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0022, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0020, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0013, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0011, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0011, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0128, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0105, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0095, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0080, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0006, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0024, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0080, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0081, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0091, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0080, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0009, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0002, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0086, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0015, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0080, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0090, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0106, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0007, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0123, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0088, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0022, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0111, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0088, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0091, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0101, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0113, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0093, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0013, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0084, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0014, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0021, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0092, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0129, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0009, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0020, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0080, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0022, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0132, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0141, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0104, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0018, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0019, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0110, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0017, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0085, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0012, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0096, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0021, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0020, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0089, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0101, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0014, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0009, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0010, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0103, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0023, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0015, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0015, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0014, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0016, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0012, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0096, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0120, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0014, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0119, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0086, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0014, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0013, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0013, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0007, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0019, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0097, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0024, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0021, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0015, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0006, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0006, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0016, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0178, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0084, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0124, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0090, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0089, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0098, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0011, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0134, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0001, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0146, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0187, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0018, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0017, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0096, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0084, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0003, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0022, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0118, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0022, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0013, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0107, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0167, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0021, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0021, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0093, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0099, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0178, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0141, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0087, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0083, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0180, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0143, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0137, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0086, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0081, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0096, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0185, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0015, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0131, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0082, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0114, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0091, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0018, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0019, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0022, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0021, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0138, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0093, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0090, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0107, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0121, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0106, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0021, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0098, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0022, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0008, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0008, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0113, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0095, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0089, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0020, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0015, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0017, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0114, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0005, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0091, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0109, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0009, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0095, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0083, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0019, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0093, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0018, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0012, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0017, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0116, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0008, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0002, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0139, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0003, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0111, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0158, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0023, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0003, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0089, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0114, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0093, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0003, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0112, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0120, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0111, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0009, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0084, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0010, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0011, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0018, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0085, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0128, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0182, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0092, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0007, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0158, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0091, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0018, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0100, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0016, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0099, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0125, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0087, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0017, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0096, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0088, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0138, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0023, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0019, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0091, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0010, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0090, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0022, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0126, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0012, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0007, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0085, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0080, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0148, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0024, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0004, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0017, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0021, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0124, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0083, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0111, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0015, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0016, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0003, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0123, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0141, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0167, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0004, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0112, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0093, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0087, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0102, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0112, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0135, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0093, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0014, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0113, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0118, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0117, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0010, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0014, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0017, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0020, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0093, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0092, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0113, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0023, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0100, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0155, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0135, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0011, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0130, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0094, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0085, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0269, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0019, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0019, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0007, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0021, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0106, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0016, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0084, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0013, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0122, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0022, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0005, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0091, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0016, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0088, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0129, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0171, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0097, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0081, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0111, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0084, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0001, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0083, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0013, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0084, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0019, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0019, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0018, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0016, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0100, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0005, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0018, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0111, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0002, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0121, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0110, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0135, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0024, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0085, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0080, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0023, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0101, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0011, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0080, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0005, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0005, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0092, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0022, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0155, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0102, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0121, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0082, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0110, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0024, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0020, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0015, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0081, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0087, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0132, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0085, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0018, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0179, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0022, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0002, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0023, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0119, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0006, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0122, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0140, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0092, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0011, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0085, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0020, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0019, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0024, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0145, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0111, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0009, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0088, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0021, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0024, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0021, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0022, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0023, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0012, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0021, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0102, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0024, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0011, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0017, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0021, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0097, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0087, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0008, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0100, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0105, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0117, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0117, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0004, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0008, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0083, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0082, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0024, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0017, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0127, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0099, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0149, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0201, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0013, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0107, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0015, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0005, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0122, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0019, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0024, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0195, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0157, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0019, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0020, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0104, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0087, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0008, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0023, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0017, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0024, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0018, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0013, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0092, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0003, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0125, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0118, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0092, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0147, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0017, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0019, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0003, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0183, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0012, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0022, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0099, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0011, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0002, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0156, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0024, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0119, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0020, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0017, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0083, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0018, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0010, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0149, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0122, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0023, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0117, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0097, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0091, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0023, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0119, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0017, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0082, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0187, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0162, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0151, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0023, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0011, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0081, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0109, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0017, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0021, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0012, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0081, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0080, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0019, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0103, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0021, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0084, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0104, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0010, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0114, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0007, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0012, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0010, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0145, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0146, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0019, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0120, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0134, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0022, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0020, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0088, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0007, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0094, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0011, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0110, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0088, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0143, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0021, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0109, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0091, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0103, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0024, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0116, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0010, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0092, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0082, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0080, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0011, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0019, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0019, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0091, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0018, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0016, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0021, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0136, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0084, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0096, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0091, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0124, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0006, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0121, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0001, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0099, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0014, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0088, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0006, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0017, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0089, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0097, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0022, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0180, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0082, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0110, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0020, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0115, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0141, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0106, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0112, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0097, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0024, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0173, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0010, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0145, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0007, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0122, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0017, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0081, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0109, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0088, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0099, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0168, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0023, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0002, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0017, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0088, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0013, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0114, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0081, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0175, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0102, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0131, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0021, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0095, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0015, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0154, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0124, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0018, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0111, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0016, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0082, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0126, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0118, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0132, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0019, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0022, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0104, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0096, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0194, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0090, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0016, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0133, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0016, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0104, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0102, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0016, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0084, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0011, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0226, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0092, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0115, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0080, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0088, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0107, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0021, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0106, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0114, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0111, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0148, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0009, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0001, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0127, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0080, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0009, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0126, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0152, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0093, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0022, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0123, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0097, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0123, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0099, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0109, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0012, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0095, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0022, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0140, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0083, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0080, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0129, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0019, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0175, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0109, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0143, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0020, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0130, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0017, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0091, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0129, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0019, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0016, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0105, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0087, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0081, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0091, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0111, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0098, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0024, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0004, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0023, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0011, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0021, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0089, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0008, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0010, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0096, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0148, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0090, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0085, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0098, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0087, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0020, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0102, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0024, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0112, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0112, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0087, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0020, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0081, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0094, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0107, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0009, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0023, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0081, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0017, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0010, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0098, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0091, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0102, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0010, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0006, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0007, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0003, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0015, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0009, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0139, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0009, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0016, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0146, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0080, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0132, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0006, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0005, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0081, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0022, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0013, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0106, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0140, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0016, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0013, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0117, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0121, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0020, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0008, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0117, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0007, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0107, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0110, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0125, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0147, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0003, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0108, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0094, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0115, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0015, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0098, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0008, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0024, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0120, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0086, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0024, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0021, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0021, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0099, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0080, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0080, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0102, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0017, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0143, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0023, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0132, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0019, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0021, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0008, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0023, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0082, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0010, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0017, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0014, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0010, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0090, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0023, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0015, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0023, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0016, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0013, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0022, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0106, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0151, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0021, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0086, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0005, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0108, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0017, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0005, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0019, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0096, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0102, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0086, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0005, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0115, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0111, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0094, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0102, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0001, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0005, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0139, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0107, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0220, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0140, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0003, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0012, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0175, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0194, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0090, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0020, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0098, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0020, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0021, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0085, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0024, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0110, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0180, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0019, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0098, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0084, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0011, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0023, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0091, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0023, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0022, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0106, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0009, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0108, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0016, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0122, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0087, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0010, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0004, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0084, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0094, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0010, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0013, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0017, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0019, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0118, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0008, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0020, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0014, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0090, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0097, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(7.9217e-05, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0011, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0083, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0122, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0081, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0018, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0118, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0018, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0024, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0117, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0082, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0018, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0007, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0153, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0021, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0112, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0113, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0023, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0021, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0119, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0016, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0080, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0107, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0080, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0087, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0014, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0085, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0019, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0081, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0024, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0015, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0088, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0015, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0012, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0140, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0015, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0005, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0179, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0111, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0095, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0019, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0011, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0006, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0019, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0017, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0095, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0151, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0021, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0127, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0087, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0012, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0008, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0017, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0111, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0098, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0107, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0081, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0098, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0103, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0022, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0083, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0089, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0112, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0097, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0083, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0021, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0105, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0099, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0021, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0020, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0019, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0103, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0023, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0089, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0023, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0019, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0183, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0166, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0024, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0153, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0001, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0101, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0024, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0092, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0003, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0095, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0118, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0013, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0017, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0089, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0122, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0116, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0086, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0018, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0115, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0112, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0004, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0113, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0019, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0086, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0131, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0093, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0088, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0120, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0019, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0015, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0007, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0144, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0099, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0022, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0091, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0013, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0098, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0012, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0134, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0021, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0019, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0011, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0001, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0024, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0013, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0002, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0112, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0101, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0011, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0009, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0122, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0109, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0015, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0113, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0010, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0134, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0015, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0023, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0015, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0110, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0109, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0003, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0098, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0081, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0123, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0131, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0016, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0125, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0089, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0007, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0091, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0017, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0023, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0101, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0102, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0005, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0153, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0024, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0091, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0011, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0080, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0015, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0024, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0136, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0082, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0024, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0084, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0086, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0121, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0007, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0007, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0020, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0109, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0022, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0112, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0103, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0010, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0162, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0095, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0020, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0015, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0020, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0194, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0106, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0116, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0156, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0082, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0024, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0007, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0005, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0023, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0085, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0023, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0002, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0174, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0003, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0021, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0012, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0130, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0085, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0009, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0020, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0093, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0016, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0024, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0140, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0002, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0087, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0123, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0122, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0106, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0105, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0132, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0018, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0098, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0015, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0167, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0010, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0020, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0016, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0002, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0148, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0114, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0080, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0090, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0018, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0080, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0114, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0111, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0019, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0080, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0081, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0097, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0088, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0017, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0103, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0104, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0086, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0082, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0009, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0120, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0091, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0018, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0085, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0013, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0102, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0016, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0010, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0004, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0010, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0019, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0001, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0081, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0016, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0105, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0100, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0172, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0019, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0013, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0086, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0113, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0004, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0153, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0024, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0008, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0017, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0095, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0001, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0105, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0019, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0084, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0112, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0015, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0017, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0094, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0175, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0082, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0009, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0140, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0013, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0084, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0087, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0013, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0182, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0119, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0021, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0092, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0084, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0019, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0093, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0159, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0085, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0106, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0092, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0012, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0022, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0022, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0013, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0102, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0086, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0089, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0084, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0165, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0102, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0082, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0014, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0015, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0117, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0104, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0021, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0112, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0108, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0022, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0108, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0019, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0016, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0001, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0116, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0021, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(6.1066e-05, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0118, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0080, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0023, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0024, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0135, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0016, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0087, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0013, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0016, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0022, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0173, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0023, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0020, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0090, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0009, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0018, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(5.7767e-05, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0003, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0131, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0021, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0021, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0114, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0142, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0143, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0083, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0020, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0089, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0024, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0012, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0004, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0115, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0113, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(6.0418e-05, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0018, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0180, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0207, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0013, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0004, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0019, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0004, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0115, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0093, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0004, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(6.9957e-05, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0023, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0082, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0009, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(6.6950e-05, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0020, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0082, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0022, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0023, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0093, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(6.0704e-05, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0088, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0023, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0019, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0096, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0018, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0004, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0133, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0018, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0023, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0020, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0104, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0015, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0087, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0130, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0010, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0082, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0017, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0135, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0007, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0080, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0018, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0089, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0021, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0003, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0132, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0013, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0020, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0097, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0083, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0020, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0098, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0118, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0119, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0020, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0111, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0013, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0021, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0081, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0016, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0091, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0142, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0020, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0089, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0003, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0124, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0117, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0086, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0002, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0009, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0085, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0150, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0089, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0093, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0011, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0019, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0101, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0081, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0132, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0020, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0023, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0019, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0124, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0020, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0019, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0018, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0107, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0020, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0015, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0013, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0101, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0081, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0081, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0003, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0094, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0024, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0024, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0097, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0100, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0122, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0081, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0010, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0016, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0022, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0017, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0136, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0142, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0007, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0017, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0136, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0009, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0083, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0096, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0087, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0080, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0013, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0012, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0022, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0136, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0022, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0001, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0101, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0091, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0020, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0081, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0111, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0094, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0010, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0118, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0083, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0022, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0094, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0016, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0012, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0095, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0010, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0022, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0090, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0080, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0088, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0021, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0093, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0144, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0110, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0022, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0093, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0024, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0024, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0015, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0024, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0024, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0095, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0023, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0107, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0014, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0012, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0013, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0104, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0094, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0107, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0006, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0145, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0102, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0115, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0115, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0082, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0092, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0087, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0103, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0147, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0019, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0175, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0090, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0102, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0081, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0128, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0116, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0125, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0023, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0082, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0004, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0019, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0104, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0023, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0024, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0020, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0082, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0082, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0124, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0108, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0082, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0014, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0012, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0005, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0091, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0086, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0166, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0090, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0112, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0019, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0100, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0106, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0024, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0093, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0110, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0163, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0089, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0083, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0115, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0110, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0018, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0100, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0092, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0149, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0121, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0008, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0015, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0019, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0106, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0112, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0007, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0081, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0107, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0084, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0123, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0130, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0130, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0009, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0096, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0084, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0084, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0092, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0100, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0020, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0084, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0105, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0152, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0017, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0093, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0103, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0090, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0023, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0011, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0082, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0024, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0020, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0020, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0089, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0156, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0087, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0017, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0016, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0094, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0009, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0093, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0112, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0012, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0013, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0092, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0087, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0094, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0024, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0024, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0013, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0090, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0019, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0098, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0018, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0016, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0111, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0095, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0104, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0088, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0023, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0023, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0107, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0090, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0101, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0118, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0092, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0107, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0015, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0101, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0023, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0095, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0093, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0084, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0008, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0019, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0141, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0088, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0106, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0023, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0017, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0090, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0156, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0080, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0094, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0105, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0081, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0014, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0083, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0084, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0117, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0083, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0114, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0084, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0017, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0104, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0085, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0082, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0109, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0016, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0021, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0089, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0130, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0100, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0020, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0184, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0091, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0080, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0022, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0085, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0023, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0016, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0085, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0092, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0007, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0143, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0118, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0082, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0427, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0004, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0012, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0112, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0092, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0097, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0109, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0097, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0023, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0107, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0010, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0011, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0009, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0085, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0019, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0024, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0097, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0087, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0091, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0094, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0024, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0097, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0018, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0140, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0135, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0100, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0124, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0111, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0089, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0008, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0192, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0091, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0126, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0153, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0099, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0021, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0124, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0104, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0085, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0089, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0015, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0015, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0091, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0089, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0103, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0126, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0118, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0109, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0090, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0080, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0092, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0100, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0083, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0089, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0019, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0017, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0227, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0132, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0095, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0093, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0125, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0092, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0105, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0017, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0093, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0022, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0010, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0146, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0003, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0021, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0101, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0111, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0130, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0131, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0009, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0125, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0082, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0103, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0021, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0128, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0126, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0013, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0023, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0021, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0020, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0015, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0023, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0087, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0116, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0143, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0080, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0016, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0085, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0119, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0022, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0084, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0111, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0083, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0116, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0015, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0099, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0107, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0080, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0020, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0087, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0008, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0099, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0019, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0142, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0089, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0116, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0020, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0012, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0017, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0091, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0093, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0084, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0091, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0082, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0024, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0149, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0023, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0088, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0080, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0015, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0023, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0098, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0102, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0091, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0115, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0084, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0110, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0117, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0080, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0114, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0015, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0087, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0018, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0086, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0015, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0128, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0114, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0117, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0081, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0015, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0144, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0091, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0009, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0086, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0091, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0016, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0171, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0015, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0119, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0015, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0012, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0105, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0094, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0086, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0103, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0007, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0128, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0010, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0081, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0092, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0093, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0009, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0084, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0082, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0080, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0108, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0089, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0019, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0023, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0023, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0023, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0103, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0100, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0098, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0117, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0020, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0021, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0091, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0104, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0023, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0021, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0090, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0013, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0016, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0094, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0095, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0021, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0023, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0023, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0164, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0021, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0095, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0007, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0119, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0017, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0024, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0092, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0145, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0147, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0024, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0011, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0014, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0144, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0005, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0115, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0125, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0013, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0023, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0142, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0103, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0018, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0013, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0136, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0086, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0023, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0021, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0088, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0022, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0116, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0017, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0021, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0024, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0111, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0089, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0004, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0108, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0084, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0009, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0089, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0091, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0129, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0012, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0090, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0107, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0102, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0111, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0085, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0020, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0094, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0080, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0085, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0090, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0082, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0104, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0143, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0083, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0017, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0090, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0005, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0020, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0107, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0137, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0148, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0024, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0092, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0100, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0099, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0115, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0082, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0135, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0114, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0103, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0086, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0019, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0080, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0130, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0098, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0099, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0145, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0085, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0024, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0093, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0005, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0083, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0008, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0086, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0024, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0022, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0023, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0022, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0017, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0087, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0019, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0103, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0103, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0155, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0021, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0010, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0101, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0017, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0123, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0097, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0096, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0022, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0084, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0085, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0100, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0010, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0088, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0137, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0021, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0142, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0010, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0136, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0171, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0016, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0007, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0120, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0119, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0024, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0007, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0017, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0022, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0081, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0024, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0085, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0019, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0103, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0097, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0024, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0117, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0107, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0003, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0080, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0152, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0020, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0017, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0097, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0105, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0020, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0015, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0137, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0005, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0024, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0018, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0133, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0268, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0024, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0024, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0116, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0017, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0080, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0111, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0020, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0114, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0086, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0014, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0108, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0102, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0119, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0021, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0132, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0092, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0024, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0080, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0100, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0087, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0109, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0015, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0084, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0017, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0022, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0016, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0009, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0017, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0016, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0019, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0103, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0014, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0022, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0020, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0107, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0014, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0008, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0016, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0008, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0092, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0021, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0116, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0096, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0084, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0017, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0023, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0022, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0021, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0021, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0014, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0018, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0155, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0017, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0081, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0022, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0105, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0018, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0012, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0016, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0084, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0092, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0017, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0115, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0127, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0092, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0021, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0085, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0018, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0083, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0014, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0080, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0009, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0137, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0017, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0023, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0005, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0009, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0083, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0110, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0009, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0103, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0100, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0125, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0019, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0119, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0081, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0088, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0084, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0147, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0109, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0020, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0087, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0096, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0083, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0127, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0085, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0019, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0015, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0096, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0098, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0024, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0024, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0084, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0093, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0019, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0088, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0093, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0007, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0120, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0021, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0016, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0088, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0022, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0015, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0157, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0112, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0020, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0098, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0081, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0089, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0081, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0086, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0089, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0154, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0021, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0081, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0022, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0022, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0022, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0108, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0111, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0092, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0017, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0012, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0126, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0087, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0122, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0007, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0018, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0117, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0089, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0024, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0107, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0017, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0080, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0083, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0091, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0133, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0018, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0111, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0098, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0112, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0010, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0101, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0105, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0022, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0017, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0015, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0022, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0084, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0020, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0122, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0119, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0085, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0019, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0081, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0009, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0007, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0092, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0022, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0099, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0008, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0123, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0017, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0113, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0112, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0009, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0021, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0097, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0011, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0023, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0152, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0023, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0015, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0120, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0121, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0095, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0098, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0083, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0084, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0015, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0101, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0097, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0022, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0101, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0088, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0087, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0019, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0098, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0015, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0018, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0020, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0011, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0022, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0023, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0102, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0099, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0012, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0133, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0083, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0006, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0109, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0110, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0176, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0084, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0019, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0020, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0024, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0016, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0004, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0018, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0024, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0020, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0018, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0101, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0089, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0126, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0094, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0084, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0100, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0080, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0093, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0085, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0083, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0004, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0019, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0113, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0083, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0083, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0091, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0154, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0081, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0019, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0084, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0108, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0094, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0019, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0009, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0019, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0018, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0090, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0093, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0089, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0015, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0013, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0084, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0092, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0014, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0104, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0114, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0095, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0020, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0023, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0013, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0094, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0095, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0011, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0095, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0090, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0019, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0114, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0090, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0020, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0008, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0081, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0024, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0086, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0102, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0010, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0081, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0089, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0091, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0088, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0134, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0023, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0091, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0015, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0113, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0016, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0023, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0106, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0133, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0022, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0084, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0010, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0083, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0101, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0006, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0018, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0096, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0102, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0085, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0086, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0006, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0024, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0024, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0018, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0101, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0109, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0023, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0090, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0109, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0086, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0023, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0082, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0107, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0082, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0021, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0014, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0106, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0093, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0134, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0157, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0080, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0139, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0097, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0120, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0024, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0089, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0122, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0104, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0106, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0080, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0104, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0120, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0115, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0109, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0137, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0022, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0087, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0088, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0012, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0121, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0086, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0082, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0024, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0097, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0017, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0084, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0112, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0121, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0087, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0094, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0093, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0019, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0102, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0092, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0103, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0088, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0022, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0094, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0083, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0110, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0090, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0013, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0020, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0124, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0014, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0087, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0020, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0011, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0108, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0089, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0014, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0021, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0098, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0089, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0108, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0089, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0083, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0019, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0017, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0133, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0146, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0017, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0113, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0023, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0087, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0090, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0020, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0021, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0021, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0087, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0017, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0024, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0084, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0023, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0017, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0119, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0165, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0090, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0017, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0010, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0102, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0089, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0083, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0023, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0011, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0088, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0103, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0126, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0087, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0012, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0084, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0085, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0095, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0112, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0102, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0104, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0088, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0013, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0102, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0088, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0114, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0108, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0094, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0006, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0090, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0106, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0102, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0109, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0088, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0205, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0019, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0112, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0017, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0011, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0018, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0099, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0018, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0082, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0091, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0022, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0083, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0119, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0024, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0012, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0024, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0153, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0013, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0144, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0101, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0019, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0196, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0086, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0095, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0088, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0024, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0082, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0013, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0011, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0023, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0097, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0080, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0080, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0089, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0088, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0106, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0093, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0087, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0083, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0081, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0020, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0121, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0023, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0005, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0085, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0020, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0014, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0020, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0102, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0016, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0007, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0101, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0082, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0097, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0134, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0099, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0013, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0107, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0015, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0117, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0021, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0019, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0012, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0016, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0090, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0015, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0120, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0022, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0080, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0019, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0084, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0081, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0015, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0096, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0104, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0086, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0017, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0022, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0022, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0100, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0017, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0101, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0013, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0024, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0022, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0088, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0086, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0013, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0017, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0112, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0018, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0021, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0014, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0016, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0016, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0024, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0008, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0014, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0018, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0019, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0108, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0115, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0083, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0019, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0136, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0086, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0083, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0020, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0017, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0095, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0022, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0101, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0147, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0103, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0092, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0024, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0017, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0080, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0022, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0087, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0090, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0140, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0097, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0082, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0085, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0022, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0017, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0085, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0098, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0023, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0081, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0022, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0091, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0165, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0024, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0090, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0110, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0102, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0006, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0020, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0111, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0018, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0083, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0018, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0020, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0014, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0019, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0024, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0086, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0015, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0019, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0120, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0022, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0021, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0087, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0022, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0016, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0097, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0102, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0094, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0015, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0082, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0089, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0007, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0019, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0084, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0144, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0010, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0101, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0023, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0103, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0024, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0090, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0023, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0024, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0011, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0155, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0085, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0100, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0108, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0011, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0080, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0105, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0166, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0020, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0017, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0132, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0097, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0011, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0123, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0105, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0017, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0014, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0012, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0005, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0116, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0087, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0081, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0018, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0023, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0096, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0125, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0106, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0023, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0021, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0014, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0010, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0022, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0083, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0087, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0021, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0087, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0021, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0022, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0086, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0083, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0018, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0022, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0020, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0086, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0017, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0020, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0021, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0082, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0088, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0015, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0020, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0106, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0127, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0103, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0107, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0021, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0087, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0092, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0082, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0090, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0023, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0012, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0012, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0010, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0022, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0101, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0007, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0083, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0082, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0086, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0017, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0020, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0023, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0016, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0095, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0095, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0017, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0093, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0109, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0134, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0143, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0017, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0023, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0106, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0091, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0088, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0081, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0018, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0013, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0021, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0012, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0098, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0019, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0108, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0085, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0123, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0096, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0145, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0081, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0011, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0097, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0109, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0086, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0098, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0017, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0022, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0016, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0095, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0101, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0119, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0019, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0162, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0009, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0105, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0019, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0022, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0123, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0009, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0094, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0106, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0107, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0083, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0101, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0080, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0081, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0151, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0139, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0022, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0110, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0018, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0099, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0017, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0087, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0086, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0019, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0020, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0017, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0151, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0121, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0023, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0106, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0117, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0156, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0082, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0080, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0081, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0101, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0020, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0124, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0098, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0085, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0016, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0093, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0080, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0016, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0013, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0021, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0100, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0014, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0019, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0083, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0022, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0081, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0120, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0022, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0100, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0020, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0087, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0017, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0017, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0004, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0016, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0022, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0012, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0024, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0087, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0091, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0019, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0100, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0094, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0023, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0013, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0017, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0109, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0024, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0108, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0021, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0081, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0092, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0009, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0023, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0081, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0024, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0019, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0018, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0132, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0093, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0092, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0023, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0104, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0016, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0017, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0016, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0095, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0018, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0110, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0016, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0018, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0024, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0024, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0019, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0011, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0115, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0010, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0100, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0020, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0018, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0021, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0015, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0089, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0016, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0024, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0120, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0022, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0013, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0083, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0084, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0007, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0137, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0009, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0090, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0085, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0014, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0023, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0135, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0013, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0014, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0024, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0131, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0023, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0086, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0013, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0015, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0010, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0107, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0087, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0103, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0120, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0081, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0091, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0023, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0083, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0021, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0011, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0024, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0016, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0011, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0089, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0082, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0081, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0009, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0012, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0019, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0089, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0009, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0088, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0021, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0024, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0081, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0020, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0083, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0097, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0096, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0024, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0024, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0082, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0013, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0019, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0087, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0020, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0018, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0015, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0010, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0116, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0084, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0169, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0006, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0023, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0017, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0018, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0023, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0012, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0019, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0020, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0083, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0094, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0089, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0012, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0105, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0015, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0151, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0128, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0005, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0094, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0102, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0016, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0023, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0124, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0081, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0012, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0017, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0024, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0108, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0093, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0016, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0016, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0011, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0014, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0023, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0089, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0016, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0089, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0088, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0086, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0090, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0024, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0142, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0019, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0013, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0022, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0013, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0085, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0017, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0013, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0006, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0021, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0018, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0010, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0098, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0023, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0019, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0024, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0023, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0082, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0005, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0016, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0007, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0009, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0022, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0098, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0101, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0117, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0024, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0081, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0108, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0106, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0005, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0101, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0009, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0011, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0017, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0091, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0022, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0080, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0016, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0107, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0013, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0022, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0123, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0090, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0018, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0013, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0014, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0119, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0135, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0093, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0012, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0024, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0011, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0082, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0124, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0003, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0024, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0105, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0012, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0010, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0016, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0015, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0012, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0105, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0142, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0086, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0104, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0010, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0132, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0010, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0180, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0083, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0083, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0020, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0100, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0009, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0015, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0024, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0080, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0024, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0024, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0083, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0148, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0011, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0017, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0024, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0012, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0012, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0024, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0107, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0021, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0023, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0019, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0112, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0015, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0115, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0099, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0018, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0012, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0117, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0018, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0013, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0089, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0022, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0211, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0094, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0016, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0096, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0206, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0095, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0094, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0020, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0019, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0112, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0095, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0024, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0110, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0013, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0020, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0108, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0012, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0022, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0009, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0020, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0008, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0017, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0087, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0085, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0009, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0014, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0082, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0022, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0023, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0012, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0015, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0100, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0014, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0094, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0020, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0132, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0021, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0087, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0011, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0088, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0017, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0115, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0017, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0131, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0020, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0019, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0088, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0084, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0098, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0138, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0011, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0023, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0024, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0105, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0005, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0114, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0099, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0007, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0082, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0110, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0092, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0022, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0021, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0096, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0096, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0083, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0013, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0083, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0011, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0123, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0099, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0088, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0017, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0010, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0162, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0081, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0080, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0082, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0016, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0085, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0110, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0092, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0120, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0087, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0009, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0016, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0011, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0022, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0024, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0096, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0023, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0010, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0138, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0100, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0086, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0127, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0019, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0157, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0021, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0019, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0009, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0010, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0021, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0149, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0082, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0001, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0022, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0013, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0014, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0005, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0088, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0011, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0019, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0089, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0099, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0114, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0013, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0091, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0096, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0019, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0004, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0012, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0099, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0012, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0022, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0015, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0005, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0094, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0017, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0016, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0016, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0023, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0106, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0021, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0088, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0011, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0016, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0099, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0081, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0021, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0015, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0005, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0022, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0087, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0021, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0017, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0082, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0012, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0105, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0011, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0019, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0022, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0007, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0093, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0010, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0108, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0012, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0090, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0014, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0087, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0023, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0013, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0016, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0100, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0086, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0011, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0181, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0016, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0093, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0015, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0091, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0080, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0023, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0104, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0018, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0011, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0080, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0015, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0020, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0014, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0011, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0023, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0019, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0024, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0009, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0023, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0012, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0023, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0010, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0138, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0019, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0139, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0106, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0013, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0092, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0014, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0020, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0023, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0019, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0125, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0023, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0116, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0130, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0101, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0139, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0020, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0017, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0086, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0108, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0080, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0170, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0005, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0022, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0093, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0023, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0100, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0107, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0009, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0017, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0129, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0172, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0020, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0010, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0014, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0015, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0016, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0103, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0085, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0098, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0024, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0098, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0106, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0005, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0015, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0013, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0101, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0082, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0087, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0022, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0095, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0022, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0013, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0020, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0013, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0012, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0088, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0094, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0094, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0099, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0023, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0101, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0015, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0022, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0109, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0080, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0080, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0020, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0014, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0105, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0022, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0014, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0018, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0024, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0015, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0099, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0090, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0015, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0015, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0012, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0024, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0092, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0014, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0021, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0016, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0009, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0083, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0008, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0017, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0096, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0007, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0009, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0014, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0011, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0097, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0088, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0012, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0020, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0022, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0154, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0084, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0138, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0021, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0020, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0021, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0022, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0014, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0022, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0122, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0016, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0006, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0083, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0022, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0112, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0011, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0021, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0008, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0087, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0018, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0119, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0089, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0012, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0085, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0086, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0159, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0086, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0024, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0095, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0018, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0018, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0022, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0014, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0096, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0015, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0085, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0012, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0019, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0100, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0009, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0019, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0102, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0021, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0014, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0019, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0090, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0010, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0011, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0016, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0015, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0005, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0093, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0011, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0085, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0018, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0015, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0022, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0081, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0081, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0105, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0080, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0021, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0020, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0019, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0022, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0017, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0094, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0089, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0087, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0098, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0024, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0017, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0018, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0104, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0012, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0015, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0089, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0013, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0024, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0010, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0014, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0016, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0094, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0018, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0024, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0004, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0088, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0005, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0020, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0013, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0013, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0019, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0013, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0024, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0016, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0145, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0015, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0021, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0089, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0120, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0013, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0016, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0016, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0018, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0018, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0023, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0010, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0112, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0091, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0010, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0139, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0008, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0013, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0092, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0019, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0009, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0022, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0014, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0018, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0088, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0085, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0089, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0088, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0129, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0023, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0106, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0095, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0097, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0094, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0102, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0018, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0013, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0087, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0102, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0013, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0008, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0099, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0015, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0101, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0111, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0024, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0018, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0013, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0012, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0083, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0017, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0096, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0084, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0011, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0102, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0126, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0022, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0085, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0012, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0105, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0016, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0010, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0023, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0005, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0088, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0155, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0085, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0104, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0084, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0010, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0015, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0088, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0016, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0005, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0023, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0015, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0008, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0014, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0010, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0024, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0099, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0086, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0022, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0081, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0023, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0080, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0008, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0107, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0118, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0080, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0129, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0015, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0091, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0138, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0016, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0013, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0110, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0020, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0085, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0088, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0022, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0020, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0024, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0023, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0137, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0018, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0134, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0023, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0084, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0014, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0016, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0113, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0018, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0089, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0094, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0088, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0085, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0014, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0011, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0107, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0022, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0022, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0018, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0014, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0087, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0159, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0017, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0108, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0100, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0085, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0098, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0021, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0022, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0102, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0084, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0080, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0022, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0103, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0178, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0098, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0023, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0018, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0017, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0023, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0097, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0018, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0091, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0016, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0124, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0024, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0092, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0104, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0116, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0098, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0023, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0018, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0099, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0087, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0009, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0084, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0015, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0085, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0101, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0099, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0099, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0022, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0091, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0086, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0085, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0013, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0082, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0023, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0024, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0007, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0020, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0006, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0024, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0018, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0022, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0013, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0018, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0080, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0104, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0124, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0011, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0105, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0007, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0011, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0016, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0107, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0081, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0111, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0081, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0021, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0080, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0023, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0014, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0112, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0024, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0091, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0089, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0105, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0084, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0081, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0107, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0020, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0012, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0016, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0080, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0081, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0087, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0024, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0018, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0089, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0081, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0022, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0023, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0017, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0023, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0022, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0013, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0018, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0020, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0014, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0012, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0106, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0082, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0015, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0012, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0019, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0096, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0091, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0091, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0116, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0094, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0020, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0085, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0020, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0020, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0017, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0017, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0016, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0124, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0015, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0084, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0011, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0100, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0009, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0101, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0102, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0022, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0083, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0011, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0114, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0020, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0090, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0127, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0119, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0019, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0021, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0013, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0024, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0018, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0094, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0080, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0123, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0010, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0021, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0103, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0086, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0023, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0119, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0016, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0081, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0020, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0081, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0096, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0095, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0085, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0019, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0080, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0084, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0019, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0107, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0019, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0103, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0080, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0098, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0010, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0085, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0101, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0110, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0089, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0022, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0021, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0112, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0022, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0024, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0016, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0126, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0024, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0110, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0019, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0083, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0084, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0101, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0022, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0014, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0024, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0024, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0021, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0089, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0110, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0016, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0023, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0083, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0100, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0131, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0087, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0089, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0132, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0098, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0020, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0014, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0020, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0022, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0012, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0008, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0023, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0015, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0013, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0021, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0105, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0088, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0013, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0091, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0092, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0009, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0100, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0082, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0097, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0130, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0087, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0093, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0020, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0024, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0017, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0020, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0017, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0102, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0024, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0086, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0013, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0092, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0022, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0024, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0018, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0156, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0024, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0018, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0005, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0012, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0019, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0016, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0007, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0024, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0107, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0093, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0018, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0091, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0093, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0084, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0019, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0130, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0023, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0145, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0097, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0010, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0022, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0094, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0160, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0019, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0095, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0114, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0121, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0090, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0022, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0020, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0022, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0137, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0009, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0110, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0115, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0084, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0084, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0022, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0112, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0089, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0091, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0011, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0015, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0115, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0020, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0139, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0098, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0121, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0015, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0009, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0024, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0016, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0098, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0023, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0010, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0088, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0106, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0117, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0089, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0080, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0009, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0013, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0008, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0020, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0012, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0022, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0023, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0010, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0083, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0116, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0080, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0011, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0098, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0083, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0023, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0107, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0087, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0142, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0024, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0103, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0085, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0023, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0022, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0083, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0014, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0024, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0022, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0019, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0011, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0018, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0024, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0019, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0083, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0014, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0096, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0133, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0020, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0093, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0081, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0088, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0017, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0008, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0086, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0022, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0016, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0018, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0089, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0022, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0121, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0024, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0086, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0115, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0023, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0150, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0019, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0018, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0016, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0020, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0023, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0021, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0008, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0149, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0082, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0123, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0024, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0113, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0021, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0094, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0018, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0023, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0019, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0080, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0094, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0126, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0099, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0018, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0087, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0013, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0108, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0090, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0015, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0014, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0016, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0008, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0022, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0006, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0082, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0106, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0096, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0016, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0080, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0015, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0091, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0023, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0107, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0013, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0014, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0089, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0085, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0098, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0008, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0011, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0010, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0124, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0018, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0098, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0022, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0020, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0090, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0015, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0163, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0081, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0117, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0134, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0023, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0108, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0019, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0022, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0100, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0024, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0014, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0024, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0018, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0090, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0021, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0020, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0080, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0023, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0021, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0101, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0081, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0080, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0017, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0098, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0083, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0082, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0137, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0097, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0090, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0020, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0015, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0007, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0024, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0125, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0023, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0095, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0023, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0128, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0081, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0017, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0018, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0016, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0109, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0083, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0116, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0098, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0015, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0018, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0022, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0091, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0021, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0012, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0016, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0086, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0114, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0022, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0083, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0007, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0022, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0019, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0023, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0009, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0019, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0006, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0088, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0096, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0017, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0018, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0018, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0024, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0019, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0090, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0017, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0018, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0012, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0009, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0015, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0020, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0101, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0016, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0136, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0022, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0021, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0010, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0009, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0008, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0019, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0024, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0018, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0112, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0100, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0009, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0012, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0008, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0018, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0023, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0082, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0008, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0011, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0014, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0098, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0021, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0014, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0024, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0090, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0018, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0093, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0013, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0022, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0016, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0024, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0024, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0024, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0019, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0111, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0085, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0019, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0013, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0017, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0095, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0081, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0023, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0022, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0016, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0017, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0080, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0114, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0082, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0013, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0085, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0108, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0022, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0020, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0020, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0004, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0091, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0018, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0009, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0084, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0017, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0099, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0105, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0009, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0091, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0097, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0080, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0112, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0023, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0020, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0082, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0081, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0024, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0019, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0082, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0091, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0101, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0019, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0015, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0017, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0087, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0015, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0016, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0097, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0081, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0082, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0095, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0084, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0024, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0020, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0023, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0019, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0013, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0008, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0123, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0007, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0098, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0023, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0018, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0020, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0008, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0096, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0014, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0088, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0014, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0080, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0089, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0094, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0014, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0093, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0022, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0024, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0017, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0020, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0092, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0087, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0015, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0141, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0114, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0087, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0022, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0022, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0016, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0127, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0021, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0016, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0019, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0081, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0091, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0023, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0023, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0083, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0094, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0084, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0022, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0018, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0091, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0020, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0016, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0024, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0114, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0010, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0014, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0018, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0015, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0017, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0013, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0012, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0017, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0005, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0007, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0136, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0084, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0170, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0013, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0018, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0015, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0007, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0015, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0020, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0015, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0114, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0017, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0114, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0015, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0013, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0020, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0023, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0015, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0022, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0019, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0013, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0014, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0104, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0014, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0101, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0018, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0018, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0019, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0014, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0011, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0020, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0086, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0017, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0083, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0013, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0019, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0024, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0011, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0016, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0096, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0017, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0013, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0093, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0093, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0017, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0084, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0009, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0015, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0099, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0014, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0022, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0090, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0024, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0099, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0016, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0018, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0014, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0021, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0116, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0130, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0021, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0023, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0010, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0020, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0010, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0118, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0020, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0017, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0009, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0013, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0021, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0083, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0148, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0101, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0022, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0023, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0019, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0086, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0022, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0015, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0086, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0098, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0093, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0086, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0013, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0017, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0024, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0020, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0090, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0098, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0020, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0023, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0016, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0021, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0082, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0024, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0013, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0087, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0018, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0112, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0022, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0023, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0023, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0020, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0088, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0021, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0096, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0020, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0021, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0120, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0008, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0111, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0084, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0024, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0017, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0022, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0017, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0017, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0024, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0006, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0023, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0155, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0085, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0013, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0022, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0106, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0085, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0018, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0023, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0017, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0019, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0018, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0081, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0023, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0013, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0016, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0132, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0104, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0080, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0021, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0020, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0014, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0022, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0125, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0015, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0021, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0024, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0017, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0022, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0015, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0015, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0081, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0085, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0099, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0101, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0097, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0122, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0018, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0018, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0080, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0012, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0087, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0122, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0024, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0016, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0017, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0022, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0122, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0013, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0014, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0019, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0013, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0088, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0087, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0019, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0022, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0118, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0022, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0082, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0017, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0021, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0016, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0020, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0089, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0082, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0080, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0015, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0110, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0011, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0020, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0023, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0009, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0146, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0096, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0021, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0105, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0018, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0084, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0016, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0005, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0009, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0013, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0003, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0021, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0015, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0100, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0021, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0090, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0092, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0022, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0017, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0084, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0085, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0021, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0094, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0087, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0091, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0018, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0015, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0086, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0103, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0088, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0094, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0020, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0081, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0080, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0024, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0108, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0020, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0017, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0095, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0023, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0024, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0082, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0017, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0023, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0087, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0082, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0017, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0107, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0092, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0112, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0105, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0015, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0019, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0084, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0019, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0019, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0086, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0019, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0023, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0020, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0093, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0019, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0020, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0013, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0010, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0019, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0022, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0094, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0014, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0017, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0020, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0106, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0086, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0023, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0024, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0009, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0013, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0021, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0083, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0087, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0016, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0009, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0020, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0080, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0110, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0084, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0011, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0020, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0018, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0018, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0012, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0015, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0018, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0023, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0007, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0085, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0023, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0086, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0080, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0014, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0098, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0009, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0014, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0015, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0088, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0016, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0012, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0022, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0093, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0022, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0023, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0021, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0014, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0087, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0022, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0009, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0022, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0101, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0085, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0018, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0016, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0021, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0024, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0024, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0090, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0007, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0020, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0022, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0092, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0019, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0118, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0084, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0019, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0083, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0120, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0091, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0023, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0098, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0100, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0014, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0019, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0118, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0128, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0111, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0006, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0016, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0080, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0083, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0019, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0024, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0011, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0014, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0011, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0022, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0013, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0015, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0021, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0021, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0092, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0014, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0012, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0022, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0021, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0010, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0012, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0081, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0091, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0082, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0016, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0012, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0083, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0021, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0024, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0021, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0021, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0018, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0012, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0120, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0024, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0012, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0019, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0009, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0107, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0021, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0011, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0106, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0091, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0021, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0109, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0024, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0007, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0023, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0012, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0023, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0012, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0097, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0012, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0009, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0090, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0133, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0088, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0093, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0019, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0006, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0024, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0019, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0088, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0020, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0021, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0010, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0006, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0023, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0013, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0105, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0085, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0023, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0023, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0015, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0082, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0021, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0012, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0004, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0011, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0017, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0014, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0020, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0014, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0019, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0080, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0017, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0082, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0088, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0083, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0023, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0019, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0086, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0082, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0022, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0018, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0095, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0091, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0121, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0010, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0128, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0021, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0018, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0024, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0011, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0084, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0157, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0089, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0125, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0024, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0021, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0020, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0016, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0015, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0019, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0022, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0004, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0013, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0016, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0009, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0106, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0022, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0021, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0013, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0017, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0019, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0023, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0080, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0101, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0090, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0093, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0019, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0101, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0010, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0017, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0024, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0122, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0018, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0020, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0021, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0023, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0088, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0014, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0023, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0016, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0005, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0011, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0089, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0024, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0083, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0017, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0083, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0107, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0008, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0116, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0023, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0120, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0138, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0018, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0019, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0024, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0015, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0113, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0090, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0090, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0128, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0019, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0021, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0024, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0015, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0016, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0081, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0019, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0017, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0014, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0024, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0101, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0014, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0096, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0018, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0018, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0019, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0020, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0100, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0105, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0023, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0100, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0120, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0024, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0010, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0009, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0023, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0095, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0010, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0085, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0015, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0084, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0084, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0087, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0019, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0088, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0019, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0018, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0082, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0096, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0024, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0020, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0023, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0155, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0101, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0024, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0018, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0014, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0082, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0102, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0007, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0010, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0103, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0020, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0089, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0020, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0083, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0021, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0019, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0100, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0120, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0081, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0017, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0017, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0021, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0106, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0024, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0110, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0013, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0108, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0085, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0012, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0008, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0021, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0080, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0111, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0099, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0011, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0021, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0024, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0018, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0084, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0081, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0125, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0024, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0020, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0021, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0024, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0019, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0172, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0084, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0019, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0017, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0019, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0022, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0101, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0022, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0023, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0084, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0014, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0023, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0022, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0011, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0021, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0006, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0099, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0095, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0024, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0082, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0109, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0081, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0114, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0113, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0022, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0092, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0142, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0010, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0007, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0011, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0007, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0019, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0013, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0098, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0022, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0092, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0024, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0085, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0014, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0024, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0021, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0012, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0017, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0019, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0016, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0018, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0018, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0088, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0014, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0022, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0017, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0020, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0105, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0011, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0012, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0088, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0082, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0011, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0003, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0085, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0008, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0126, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0008, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0103, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0080, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0092, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0022, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0018, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0019, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0093, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0010, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0024, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0010, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0133, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0018, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0011, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0084, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0017, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0100, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0015, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0083, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0089, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0020, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0021, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0014, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0093, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0022, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0105, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0015, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0024, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0090, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0014, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0020, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0006, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0014, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0018, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0103, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0020, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0086, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0108, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0012, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0021, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0092, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0093, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0011, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0017, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0108, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0080, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0018, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0109, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0102, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0022, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0089, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0016, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0022, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0139, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0083, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0021, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0082, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0013, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0003, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0085, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0023, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0081, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0081, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0099, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0011, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0114, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0014, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0021, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0010, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0102, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0023, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0008, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0005, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0015, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0013, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0023, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0106, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0013, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0092, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0126, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0016, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0023, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0023, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0024, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0011, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0021, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0114, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0006, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0012, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0080, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0023, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0097, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0112, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0014, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0023, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0024, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0105, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0023, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0009, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0016, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0087, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0019, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0019, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0003, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0019, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0086, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0088, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0022, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0024, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0024, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0020, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0100, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0008, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0080, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0012, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0084, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0011, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0020, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0084, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0106, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0017, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0084, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0132, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0105, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0098, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0019, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0087, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0022, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0023, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0013, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0024, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0094, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0101, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0113, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0145, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0017, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0021, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0017, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0023, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0083, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0024, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0007, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0020, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0089, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0086, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0022, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0090, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0009, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0024, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0018, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0011, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0019, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0019, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0016, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0021, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0023, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0085, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0017, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0090, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0017, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0022, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0105, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0095, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0115, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0017, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0023, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0012, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0108, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0080, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0085, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0124, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0020, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0015, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0020, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0019, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0007, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0021, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0080, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0008, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0014, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0006, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0015, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0021, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0020, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0089, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0023, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0017, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0021, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0011, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0021, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0024, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0121, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0019, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0023, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0084, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0020, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0020, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0086, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0082, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0019, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0020, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0023, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0019, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0022, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0021, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0114, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0023, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0011, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0018, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0024, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0082, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0020, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0014, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0015, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0022, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0023, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0084, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0024, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0010, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0018, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0019, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0022, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0008, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0107, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0146, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0101, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0021, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0023, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0021, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0021, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0019, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0092, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0018, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0106, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0017, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0100, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0090, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0020, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0023, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0021, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0012, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0014, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0022, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0023, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0110, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0021, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0008, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0016, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0021, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0014, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0016, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0022, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0082, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0021, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0022, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0005, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0022, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0021, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0018, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0020, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0013, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0023, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0089, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0110, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0020, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0127, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0016, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0088, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0022, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0020, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0017, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0019, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0022, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0104, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0018, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0020, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0009, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0022, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0083, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0093, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0014, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0023, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0009, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0014, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0091, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0024, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0014, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0020, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0081, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0019, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0012, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0012, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0012, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0094, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0083, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0024, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0022, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0005, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0093, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0081, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0013, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0016, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0016, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0024, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0021, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0016, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0020, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0021, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0091, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0095, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0022, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0014, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0015, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0014, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0125, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0019, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0080, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0014, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0020, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0022, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0011, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0019, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0143, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0022, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0011, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0020, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0017, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0021, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0020, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0080, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0106, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0019, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0013, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0092, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0024, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0023, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0020, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0018, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0012, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0085, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0089, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0019, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0019, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0087, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0022, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0022, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0020, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0023, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0016, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0138, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0024, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0021, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0105, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0017, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0017, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0024, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0116, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0109, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0016, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0013, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0016, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0017, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0082, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0085, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0091, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0010, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0016, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0010, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0089, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0090, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0007, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0082, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0021, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0018, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0010, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0021, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0010, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0018, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0012, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0013, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0083, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0008, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0020, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0020, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0091, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0086, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0016, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0103, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0013, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0017, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0024, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0089, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0021, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0011, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0127, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0022, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0021, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0013, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0084, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0021, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0014, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0091, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0016, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0018, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0021, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0087, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0015, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0021, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0015, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0022, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0024, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0098, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0015, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0013, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0021, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0014, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0080, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0019, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0007, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0021, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0010, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0024, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0011, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0016, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0091, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0012, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0022, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0085, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0017, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0019, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0018, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0081, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0139, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0020, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0010, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0017, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0020, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0019, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0088, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0018, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0124, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0090, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0021, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0020, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0013, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0083, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0015, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0016, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0014, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0021, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0084, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0023, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0010, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0141, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0021, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0084, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0016, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0021, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0016, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0022, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0019, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0125, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0019, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0013, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0014, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0017, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0024, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0020, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0100, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0091, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0016, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0080, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0023, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0108, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0017, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0018, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0020, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0021, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0023, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0009, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0098, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0004, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0016, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0015, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0020, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0020, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0021, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0088, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0016, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0081, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0014, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0024, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0081, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0020, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0024, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0003, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0024, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0009, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0088, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0004, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0021, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0109, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0023, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0022, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0021, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0011, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0014, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0010, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0010, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0015, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0019, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0090, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0012, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0014, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0081, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0017, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0083, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0019, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0017, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0022, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0105, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0024, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0085, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0021, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0082, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0115, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0016, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0018, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0024, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0023, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0012, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0084, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0089, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0014, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0080, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0084, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0097, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0023, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0004, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0022, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0024, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0023, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0018, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0085, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0015, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0100, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0012, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0023, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0080, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0021, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0008, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0010, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0084, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0111, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0080, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0016, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0016, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0008, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0024, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0018, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0024, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0019, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0020, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0081, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0012, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0007, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0016, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0100, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0019, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0004, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0121, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0087, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0008, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0014, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0082, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0012, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0015, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0024, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0015, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0022, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0088, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0018, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0013, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0019, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0024, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0113, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0021, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0016, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0008, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0011, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0018, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0021, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0021, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0082, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0018, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0024, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0020, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0015, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0015, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0117, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0112, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0015, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0012, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0013, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0022, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0007, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0024, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0007, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0012, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0083, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0094, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0014, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0018, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0018, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0001, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0001, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0001, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0001, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0001, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0001, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0001, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0001, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0001, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0001, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0001, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0001, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0001, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0001, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0001, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0001, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0001, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0001, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0005, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0018, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0087, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0014, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0089, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0023, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0008, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0021, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0019, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0019, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0021, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0021, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0020, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0023, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0020, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0024, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0015, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0019, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0017, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0015, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0020, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0019, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0020, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0022, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0015, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0081, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0021, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0010, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0006, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0013, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0092, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0020, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0019, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0016, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0021, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0010, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0021, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0022, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0083, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0012, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0017, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0013, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0022, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0019, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0013, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0089, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0023, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0085, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0021, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0008, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0020, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0024, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0088, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0084, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0094, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0012, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0021, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0011, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0018, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0022, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0011, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0024, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0087, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0024, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0019, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0022, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0011, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0022, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0007, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0138, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0101, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0118, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0024, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0011, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0090, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0024, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0024, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0021, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0024, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0014, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0016, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0106, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0015, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0004, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0017, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0020, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0014, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0149, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0104, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0020, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0008, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0022, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0024, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0013, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0021, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0024, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0018, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0024, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0100, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0022, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0009, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0098, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0024, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0020, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0018, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0016, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0022, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0020, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0021, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0082, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0018, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0015, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0083, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0017, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0010, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0024, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0011, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0017, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0016, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0022, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0020, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0022, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0017, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0012, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0020, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0020, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0024, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0013, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0016, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0011, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0016, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0022, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0018, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0081, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0011, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0021, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0012, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0023, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0013, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0020, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0021, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0021, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0019, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0015, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0088, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0017, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0011, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0016, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0020, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0016, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0010, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0016, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0114, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0023, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0013, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0014, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0019, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0015, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0083, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0022, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0005, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0007, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0012, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0013, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0011, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0020, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0019, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0014, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0020, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0096, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0006, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0022, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0020, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0019, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0012, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0021, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0014, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0011, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0121, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0021, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0010, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0015, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0021, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0023, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0158, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0017, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0021, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0014, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0006, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0089, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0021, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0017, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0137, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0004, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0022, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0022, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0080, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0141, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0018, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0020, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0020, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0081, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0021, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0016, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0011, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0024, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0011, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0085, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0023, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0019, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0010, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0024, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0082, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0013, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0004, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0020, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0010, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0006, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0083, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0023, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0019, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0013, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0018, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0014, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0021, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0017, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0012, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0012, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0126, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0013, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0018, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0019, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0080, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0016, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0022, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0022, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0017, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0022, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0011, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0013, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0089, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0023, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0006, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0014, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0102, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0020, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0015, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0021, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0015, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0014, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0024, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0023, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0006, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0016, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0014, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0012, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0086, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0013, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0023, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0021, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0087, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0011, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0020, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0019, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0019, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0022, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0019, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0114, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0006, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0016, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0024, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0007, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0016, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0023, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0024, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0024, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0083, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0015, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0012, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0016, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0018, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0020, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0017, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0008, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0011, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0088, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0024, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0018, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0024, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0006, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0083, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0018, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0019, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0018, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0019, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0019, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0024, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0018, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0005, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0016, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0009, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0017, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0021, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0009, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0133, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0021, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0011, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0024, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0004, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0014, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0021, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0110, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0014, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0023, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0024, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0006, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0015, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0018, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0004, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0019, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0011, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0023, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0021, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0014, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0013, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0019, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0023, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0015, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0017, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0088, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0008, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0015, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0019, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0022, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0018, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0016, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0020, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0022, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0015, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0014, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0088, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0088, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0016, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0088, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0015, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0121, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0016, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0021, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0008, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0006, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0010, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0022, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0081, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0082, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0016, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0082, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0020, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0024, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0006, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0021, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0023, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0013, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0013, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0022, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0019, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0017, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0008, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0023, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0017, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0017, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0011, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0015, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0019, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0107, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0022, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0023, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0013, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0022, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0016, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0015, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0016, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0023, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0019, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0013, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0023, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0105, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0021, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0016, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0015, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0129, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0017, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0022, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0016, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0019, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0097, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0020, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0016, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0012, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0018, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0022, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0084, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0017, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0021, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0021, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0010, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0022, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0022, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0016, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0022, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0146, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0095, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0080, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0014, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0011, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0015, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0011, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0013, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0024, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0022, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0016, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0018, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0110, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0008, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0020, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0098, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0022, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0095, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0013, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0011, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0083, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0024, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0018, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0020, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0017, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0019, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0081, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0011, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0023, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0013, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0019, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0022, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0014, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0013, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0017, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0020, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0013, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0015, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0007, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0019, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0024, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0092, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0104, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0019, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0011, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0015, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0023, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0019, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0085, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0098, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0013, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0020, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0019, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0096, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0023, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0102, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0087, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0089, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0019, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0020, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0019, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0020, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0022, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0010, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0094, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0015, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0012, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0010, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0018, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0016, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0023, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0019, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0016, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0014, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0021, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0124, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0023, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0023, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0017, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0013, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0085, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0017, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0014, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0022, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0020, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0020, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0020, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0085, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0009, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0085, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0107, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0015, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0006, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0011, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0018, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0147, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0018, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0086, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0019, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0097, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0018, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0086, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0022, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0017, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0146, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0018, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0082, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0013, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0024, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0005, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0018, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0007, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0012, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0015, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0024, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0086, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0021, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0008, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0098, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0015, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0092, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0086, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0019, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0016, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0011, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0100, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0018, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0021, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0019, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0020, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0019, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0008, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0020, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0085, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0011, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0022, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0007, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0019, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0022, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0015, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0083, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0019, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0023, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0020, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0022, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0024, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0013, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0021, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0023, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0090, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0016, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0023, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0023, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0086, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0022, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0024, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0097, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0023, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0024, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0099, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0107, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0016, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0020, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0022, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0017, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0089, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0106, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0094, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0090, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0095, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0014, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0024, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0008, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0012, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0005, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0090, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0007, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0021, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0159, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0010, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0088, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0084, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0020, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0113, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0013, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0019, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0130, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0019, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0013, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0115, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0013, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0014, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0020, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0024, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0022, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0081, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0024, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0023, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0002, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0008, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0090, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0017, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0089, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0024, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0022, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0012, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0014, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0013, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0020, device='cuda:0', grad_fn=<NllLossBackward>)
Saving model at iteration: 20000
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0097, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0015, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0019, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0120, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0022, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0009, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0012, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0019, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0021, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0023, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0084, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0013, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0024, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0013, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0018, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0020, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0012, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0092, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0016, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0013, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0019, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0010, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0007, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0024, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0007, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0011, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0013, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0016, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0018, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0024, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0020, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0018, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0015, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0083, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0095, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0014, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0014, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0015, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0008, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0018, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0021, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0012, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0085, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0009, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0016, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0013, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0009, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0096, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0122, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0082, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0014, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0023, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0018, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0024, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0017, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0099, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0004, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0022, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0088, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0016, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0008, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0020, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0021, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0023, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0010, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0018, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0024, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0084, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0009, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0021, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0021, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0021, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0024, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0021, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0023, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0021, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0018, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0024, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0106, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0111, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0021, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0018, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0024, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0020, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0020, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0024, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0022, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0129, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0020, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0015, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0017, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0024, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0090, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0015, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0127, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0022, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0011, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0016, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0022, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0016, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0023, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0024, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0096, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0018, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0099, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0023, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0022, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0087, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0012, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0108, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0019, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0023, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0121, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0115, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0024, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0018, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0083, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0089, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0023, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0099, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0018, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0023, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0099, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0094, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0014, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0018, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0013, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0015, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0009, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0088, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0021, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0093, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0099, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0181, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0010, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0020, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0014, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0113, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0023, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0017, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0080, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0024, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0020, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0024, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0019, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0022, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0019, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0008, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0019, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0011, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0015, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0021, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0023, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0011, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0016, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0019, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0006, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0019, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0022, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0014, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0022, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0006, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0096, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0090, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0024, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0023, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0082, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0010, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0015, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0018, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0013, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0023, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0090, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0024, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0011, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0014, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0024, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0024, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0013, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0024, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0012, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0022, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0098, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0024, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0019, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0018, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0086, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0021, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0024, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0024, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0009, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0082, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0122, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0011, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0014, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0093, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0008, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0108, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0022, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0012, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0019, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0008, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0019, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0023, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0087, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0008, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0021, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0012, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0017, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0024, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0082, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0017, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0020, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0088, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0013, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0022, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0014, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0019, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0024, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0083, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0107, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0019, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0021, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0010, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0022, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0019, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0011, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0023, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0016, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0080, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0023, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0023, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0020, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0082, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0024, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0089, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0092, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0023, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0022, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0022, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0023, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0013, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0011, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0022, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0009, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0091, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0013, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0023, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0017, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0081, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0024, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0024, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0011, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0021, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0017, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0019, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0020, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0082, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0013, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0017, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0018, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0018, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0015, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0016, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0084, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0012, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0112, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0024, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0022, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0093, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0103, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0013, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0005, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0020, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0012, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0022, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0016, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0080, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0018, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0024, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0016, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0024, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0092, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0024, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0081, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0015, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0089, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0022, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0011, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0024, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0013, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0010, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0007, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0017, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0020, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0019, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0014, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0023, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0128, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0103, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0013, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0023, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0022, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0024, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0012, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0007, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0018, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0008, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0023, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0013, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0019, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0089, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0020, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0015, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0024, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0024, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0021, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0005, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0014, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0020, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0102, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0022, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0023, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0012, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0020, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0010, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0024, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0014, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0089, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0146, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0019, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0020, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0005, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0023, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0125, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0016, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0017, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0023, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0023, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0020, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0108, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0007, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0009, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0014, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0111, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0091, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0011, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0014, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0018, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0015, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0132, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0013, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0112, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0024, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0019, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0091, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0016, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0011, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0021, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0009, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0015, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0023, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0014, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0022, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0099, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0016, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0088, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0100, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0100, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0008, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0019, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0084, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0099, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0083, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0023, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0020, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0087, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0084, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0003, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0023, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0021, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0024, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0024, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0020, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0020, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0018, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0010, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0017, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0024, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0021, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0007, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0018, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0004, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0021, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0115, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0108, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0017, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0024, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0021, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0018, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0094, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0015, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0013, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0012, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0018, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0008, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0019, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0084, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0015, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0019, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0019, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0017, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0020, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0093, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0017, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0016, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0013, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0011, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0012, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0013, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0016, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0081, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0104, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0019, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0018, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0019, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0020, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0093, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0009, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0090, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0016, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0023, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0081, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0107, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0024, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0004, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0086, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0022, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0014, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0097, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0097, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0018, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0015, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0088, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0091, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0012, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0023, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0087, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0015, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0016, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0096, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0017, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0023, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0017, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0020, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0080, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0019, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0123, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0009, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0022, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0011, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0001, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0018, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0006, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0104, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0023, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0011, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0022, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0104, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0022, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0020, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0024, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0014, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0022, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0017, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0006, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0022, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0022, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0017, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0022, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0014, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0020, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0017, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0012, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0007, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0022, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0012, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0015, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0007, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0104, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0016, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0021, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0019, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0024, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0020, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0006, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0012, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0021, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0083, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0018, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0019, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0011, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0023, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0119, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0024, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0022, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0017, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0020, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0016, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0007, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0011, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0012, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0020, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0023, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0086, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0016, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0024, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0010, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0021, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0013, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0016, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0022, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0003, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0016, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0024, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0109, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0018, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0009, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0015, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0015, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0015, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0021, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0014, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0020, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0019, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0021, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0018, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0020, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0003, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0021, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0015, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0024, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0019, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0022, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0016, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0005, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0096, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0080, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0085, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0008, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0016, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0015, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0023, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0024, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0086, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0023, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0015, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0024, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0090, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0016, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0008, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0080, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0083, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0021, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0020, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0014, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0013, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0110, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0016, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0013, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0022, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0011, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0021, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0018, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0095, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0020, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0021, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0105, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0016, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0016, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0017, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0021, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0018, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0016, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0020, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0022, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0024, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0103, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0017, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0083, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0008, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0017, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0012, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0012, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0023, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0022, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0019, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0010, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0024, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0022, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0007, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0086, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0084, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0011, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0126, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0016, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0018, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0014, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0023, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0083, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0085, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0083, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0007, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0008, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0017, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0015, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0023, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0083, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0011, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0022, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0014, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0015, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0023, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0080, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0023, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0010, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0022, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0018, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0015, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0010, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0016, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0023, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0023, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0018, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0003, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0003, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0023, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0081, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0088, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0021, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0019, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0024, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0090, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0094, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0015, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0086, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0015, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0089, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0014, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0024, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0143, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0093, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0089, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0024, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0108, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0089, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0135, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0017, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0017, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0083, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0126, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0021, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0016, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0020, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0021, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0096, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0112, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0138, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0155, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0091, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0012, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0017, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0024, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0012, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0019, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0016, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0013, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0010, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0019, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0022, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0102, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0023, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0015, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0093, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0102, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0014, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0087, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0016, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0021, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0023, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0104, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0009, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0017, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0089, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0097, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0092, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0019, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0084, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0024, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0011, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0015, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0103, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0080, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0019, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0021, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0011, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0019, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0103, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0023, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0023, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0019, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0090, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0084, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0009, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0022, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0014, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0105, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0082, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0022, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0080, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0018, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0018, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0016, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0022, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0024, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0020, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0128, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0015, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0083, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0022, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0022, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0080, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0019, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0021, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0016, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0014, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0024, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0082, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0157, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0082, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0019, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0108, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0013, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0100, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0024, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0010, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0023, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0097, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0098, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0023, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0012, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0024, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0084, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0024, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0010, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0014, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0019, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0021, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0021, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0007, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0115, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0020, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0018, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0011, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0016, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0020, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0016, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0016, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0018, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0019, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0008, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0088, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0016, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0084, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0019, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0010, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0014, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0018, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0016, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0082, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0023, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0023, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0023, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0023, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0089, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0087, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0097, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0019, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0019, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0016, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0024, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0092, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0084, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0017, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0086, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0022, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0092, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0018, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0097, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0014, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0023, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0085, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0021, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0017, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0012, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0103, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0085, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0021, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0021, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0023, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0021, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0097, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0023, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0014, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0017, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0019, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0084, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0086, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0014, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0080, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0127, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0012, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0024, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0112, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0015, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0015, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0089, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0024, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0097, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0013, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0102, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0009, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0014, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0015, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0018, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0012, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0091, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0017, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0019, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0017, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0022, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0113, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0108, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0093, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0082, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0080, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0018, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0016, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0088, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0023, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0023, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0021, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0019, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0086, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0088, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0007, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0017, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0017, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0011, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0024, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0082, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0016, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0016, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0024, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0018, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0107, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0022, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0021, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0101, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0015, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0019, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0020, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0023, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0133, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0009, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0126, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0024, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0024, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0082, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0112, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0014, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0017, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0080, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0101, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0099, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0017, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0093, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0091, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0085, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0021, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0017, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0086, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0015, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0015, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0089, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0092, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0010, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0089, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0090, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0018, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0023, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0086, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0023, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0023, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0015, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0020, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0017, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0099, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0014, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0081, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0022, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0022, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0011, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0016, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0020, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0020, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0095, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0110, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0014, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0021, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0017, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0080, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0098, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0023, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0082, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0083, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0020, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0121, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0017, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0112, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0016, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0022, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0082, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0019, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0008, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0020, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0018, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0091, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0085, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0020, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0016, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0019, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0023, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0020, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0022, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0023, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0091, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0019, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0017, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0022, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0023, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0014, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0094, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0009, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0091, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0024, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0022, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0024, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0132, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0088, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0021, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0020, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0024, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0087, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0012, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0097, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0024, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0023, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0015, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0092, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0020, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0013, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0016, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0083, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0086, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0013, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0010, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0087, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0005, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0023, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0120, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0022, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0015, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0024, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0106, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0116, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0088, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0089, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0023, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0024, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0108, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0020, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0103, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0087, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0022, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0024, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0022, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0017, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0016, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0160, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0083, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0084, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0013, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0014, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0080, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0082, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0013, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0135, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0020, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0010, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0083, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0093, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0085, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0084, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0097, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0087, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0018, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0007, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0085, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0016, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0096, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0083, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0021, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0091, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0103, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0091, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0024, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0021, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0086, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0020, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0022, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0080, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0017, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0081, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0007, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0085, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0023, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0006, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0101, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0107, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0024, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0016, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0096, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0086, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0021, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0021, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0119, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0091, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0106, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0023, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0022, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0021, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0011, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0024, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0086, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0100, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0086, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0101, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0016, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0085, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0082, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0012, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0017, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0016, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0088, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0139, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0091, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0024, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0012, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0014, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0102, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0024, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0015, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0151, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0095, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0012, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0085, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0018, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0012, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0020, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0101, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0023, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0022, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0092, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0083, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0093, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0124, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0082, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0085, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0021, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0016, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0020, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0110, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0017, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0086, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0129, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0010, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0094, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0120, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0095, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0088, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0020, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0089, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0086, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0024, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0093, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0099, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0024, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0122, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0095, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0019, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0021, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0018, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0013, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0093, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0009, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0096, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0116, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0014, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0086, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0104, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0020, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0110, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0022, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0103, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0088, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0019, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0007, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0086, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0020, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0020, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0087, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0013, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0084, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0019, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0080, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0014, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0010, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0020, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0022, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0080, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0102, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0085, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0085, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0081, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0014, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0086, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0113, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0018, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0085, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0085, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0017, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0014, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0093, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0091, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0011, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0099, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0092, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0103, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0086, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0022, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0090, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0081, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0019, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0120, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0011, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0012, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0112, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0084, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0085, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0126, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0020, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0008, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0086, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0097, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0080, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0080, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0117, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0090, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0018, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0009, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0018, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0091, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0083, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0023, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0096, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0082, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0105, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0088, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0112, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0092, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0089, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0096, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0098, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0117, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0119, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0091, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0023, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0017, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0083, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0088, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0089, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0094, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0083, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0080, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0148, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0091, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0024, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0013, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0108, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0097, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0017, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0094, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0024, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0015, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0019, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0085, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0080, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0017, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0020, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0095, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0020, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0020, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0020, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0021, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0024, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0017, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0111, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0023, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0021, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0019, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0023, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0105, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0017, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0015, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0084, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0010, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0017, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0130, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0086, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0011, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0015, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0012, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0102, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0024, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0083, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0007, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0024, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0107, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0015, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0084, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0020, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0016, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0099, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0023, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0021, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0080, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0008, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0021, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0092, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0023, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0103, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0093, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0021, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0087, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0087, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0021, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0021, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0016, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0016, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0008, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0086, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0021, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0017, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0085, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0022, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0012, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0086, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0019, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0015, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0019, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0090, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0010, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0019, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0082, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0017, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0012, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0021, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0021, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0017, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0021, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0021, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0019, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0017, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0007, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0083, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0012, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0088, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0014, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0019, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0024, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0008, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0024, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0020, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0019, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0022, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0093, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0012, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0024, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0013, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0024, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0018, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0007, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0014, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0015, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0082, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0023, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0021, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0024, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0024, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0021, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0020, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0018, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0017, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0021, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0004, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0012, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0094, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0017, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0014, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0023, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0023, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0014, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0023, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0019, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0015, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0014, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0012, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0021, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0095, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0022, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0021, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0023, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0017, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0019, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0012, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0018, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0018, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0016, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0022, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0013, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0018, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0022, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0019, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0014, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0009, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0021, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0018, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0015, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0010, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0016, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0097, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0014, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0015, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0021, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0007, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0024, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0019, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0018, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0009, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0015, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0013, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0022, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0024, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0016, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0009, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0021, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0012, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0004, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0016, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0023, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0018, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0011, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0021, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0085, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0024, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0109, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0092, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0014, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0022, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0017, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0013, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0017, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0023, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0022, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0014, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0012, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0081, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0008, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0018, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0013, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0021, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0007, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0017, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0019, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0012, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0021, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0014, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0091, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0021, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0017, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0020, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0023, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0022, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0084, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0023, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0014, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0018, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0099, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0012, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0083, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0014, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0017, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0018, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0014, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0024, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0023, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0091, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0020, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0023, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0021, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0018, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0015, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0086, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0019, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0104, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0013, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0017, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0092, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0015, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0021, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0023, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0010, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0023, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0022, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0099, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0021, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0084, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0080, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0017, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0021, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0009, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0011, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0015, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0024, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0019, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0016, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0021, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0023, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0012, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0020, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0014, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0012, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0015, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0024, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0012, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0006, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0022, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0100, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0021, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0015, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0006, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0013, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0020, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0017, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0087, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0110, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0024, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0019, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0016, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0019, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0023, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0021, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0015, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0023, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0013, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0022, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0022, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0012, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0019, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0005, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0020, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0023, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0020, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0022, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0022, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0016, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0082, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0082, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0024, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0022, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0020, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0020, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0012, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0020, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0022, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0110, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0018, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0021, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0013, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0011, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0019, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0022, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0022, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0015, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0009, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0023, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0080, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0019, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0018, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0013, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0023, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0008, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0104, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0015, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0021, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0020, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0084, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0022, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0021, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0024, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0013, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0015, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0021, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0088, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0090, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0016, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0090, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0020, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0024, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0023, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0024, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0009, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0024, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0013, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0016, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0019, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0117, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0008, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0020, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0022, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0109, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0008, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0024, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0022, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0022, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0014, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0011, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0012, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0088, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0024, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0019, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0093, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0013, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0022, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0015, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0100, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0095, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0014, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0007, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0080, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0016, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0016, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0011, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0014, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0007, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0012, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0004, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0012, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0018, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0119, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0019, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0023, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0019, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0017, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0022, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0017, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0022, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0016, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0020, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0099, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0100, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0011, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0016, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0011, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0017, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0024, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0014, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0019, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0020, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0015, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0014, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0083, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0024, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0023, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0085, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0011, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0024, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0017, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0092, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0013, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0022, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0006, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0020, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0084, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0015, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0022, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0020, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0014, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0099, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0019, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0018, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0023, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0017, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0022, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0017, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0020, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0022, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0085, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0013, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0022, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0080, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0018, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0020, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0017, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0022, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0011, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0081, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0085, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0106, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0018, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0024, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0011, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0023, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0104, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0017, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0090, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0099, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0023, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0017, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0020, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0104, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0019, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0018, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0013, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0023, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0024, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0024, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0088, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0021, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0020, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0019, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0084, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0017, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0020, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0080, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0021, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0020, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0018, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0023, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0129, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0022, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0084, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0021, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0014, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0016, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0023, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0020, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0023, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0019, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0085, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0015, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0021, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0013, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0020, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0017, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0081, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0024, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0024, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0018, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0083, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0021, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0024, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0015, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0024, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0013, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0020, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0016, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0016, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0020, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0024, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0082, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0024, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0018, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0024, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0015, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
--------------Restarting: Beginning EPOCH 10-------------
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Ran a evaluation step!
Test Accuracy: 0.9608336327703917
Loss: tensor(0.0092, device='cuda:0', grad_fn=<NllLossBackward>)
Saving model at iteration: 0
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0102, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0087, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0132, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0111, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0157, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0105, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0007, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0115, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0085, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0139, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0111, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0144, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0104, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0084, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0084, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0093, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0099, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0116, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0090, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0195, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0081, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0122, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0115, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0147, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0088, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0123, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0023, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0022, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0185, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0083, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0149, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0137, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0009, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0005, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0009, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0083, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0003, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0159, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0086, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0005, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0101, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0014, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0105, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0133, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0080, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0019, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0023, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0001, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0012, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0012, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0084, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0001, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0001, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0088, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0187, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0097, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0011, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0137, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0103, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0001, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0112, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0086, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0097, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0001, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0012, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0106, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0003, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0122, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0091, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0012, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0001, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0001, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0003, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0084, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0080, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0080, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0003, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0003, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0080, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0087, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0001, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0003, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0001, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0003, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0010, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0001, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0104, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0129, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0022, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0117, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0003, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0080, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0007, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0001, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0001, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0109, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0130, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0007, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0003, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0001, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0001, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0001, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0001, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0117, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0105, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0089, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0010, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0018, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0001, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0099, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0003, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0001, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0133, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0001, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0001, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0003, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0003, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0093, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0140, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0127, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0087, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0011, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0080, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0172, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0171, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0144, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0099, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0017, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0004, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0001, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0216, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0087, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0001, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0014, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0133, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0178, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0083, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0008, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0116, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0119, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0013, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0100, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0104, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0082, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0082, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0107, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0087, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0093, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0101, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0106, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0128, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0115, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0162, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0019, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0145, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0084, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0166, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0006, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0021, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0115, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0096, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0089, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0125, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0080, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0018, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0115, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0008, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0173, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0005, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0093, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0023, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0087, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0095, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0139, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0010, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0102, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0080, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0083, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0143, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0117, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0103, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0094, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0083, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0016, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0137, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0013, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0092, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0012, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0085, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0142, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0110, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0164, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0152, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0145, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0137, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0086, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0135, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0012, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0108, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0016, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0084, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0102, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0020, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0140, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0120, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0095, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0097, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0021, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0155, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0142, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0022, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0024, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0108, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0015, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0019, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0006, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0016, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0016, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0019, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0091, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0022, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0103, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0124, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0020, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0119, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0017, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0132, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0091, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0011, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0163, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0023, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0157, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0093, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0088, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0097, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0152, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0103, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0144, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0017, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0086, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0093, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0080, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0089, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0115, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0091, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0015, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0191, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0096, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0177, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0016, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0161, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0009, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0090, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0081, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0013, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0003, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0017, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0010, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0020, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0098, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0015, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0019, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0087, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0094, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0127, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0015, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0131, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0093, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0022, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0016, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0016, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0006, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0123, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0087, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0084, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0096, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0086, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0105, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0110, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0087, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0004, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0113, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0117, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0104, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0083, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0001, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0080, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0005, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0090, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0013, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0089, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0118, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0125, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0006, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0023, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0024, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0006, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0092, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0114, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0121, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0010, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0014, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0010, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0132, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0149, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0084, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0149, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0101, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0015, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0017, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0002, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0125, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0120, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0094, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0006, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0012, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0103, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0002, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0084, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0016, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0086, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0007, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0082, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0128, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0088, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0080, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0008, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0107, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0083, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0085, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0160, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0113, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0095, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0087, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0009, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0086, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0118, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0119, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0019, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0013, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0084, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0088, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0010, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0024, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0164, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0004, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0021, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0023, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0020, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0016, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0005, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0006, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0093, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0085, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0018, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0021, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0018, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0087, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0009, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0112, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0090, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0008, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0137, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0099, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0019, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0095, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0102, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0015, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0118, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0009, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0087, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0105, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0121, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0023, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0091, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0009, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0159, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0019, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0016, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0170, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0115, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0080, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0196, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0094, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0022, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0088, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0107, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0126, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0092, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0021, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0115, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0130, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0152, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0104, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0119, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0023, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0185, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0088, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0014, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0020, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0014, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0128, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0174, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0118, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0014, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0125, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0005, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0023, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0018, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0106, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0011, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0082, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0017, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0129, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0010, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0162, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0016, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0019, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0013, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0136, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0020, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0080, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0008, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0087, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0010, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0008, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0003, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0016, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0094, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0010, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0097, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0223, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0022, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0128, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0020, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0010, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0012, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0110, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0123, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0002, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0096, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0099, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0010, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0120, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0088, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0001, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0023, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0100, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0177, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0090, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0092, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0017, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0021, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0021, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0098, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0012, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0004, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0099, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0022, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0023, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0094, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0103, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0080, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0084, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0113, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0122, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0022, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0080, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0021, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0009, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0080, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0099, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0082, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0108, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0122, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0116, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0087, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0018, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0013, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0014, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0114, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0102, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0102, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0109, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0080, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0080, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0143, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0024, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0101, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0001, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0136, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0087, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0099, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0080, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0093, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0007, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0105, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0162, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0016, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0140, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0020, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0119, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0084, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0023, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0011, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0131, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0134, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0130, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0111, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0102, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0093, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0099, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0103, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0147, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0080, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0181, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0009, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0092, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0126, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0020, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0123, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0091, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0003, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0103, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0095, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0134, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0089, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0124, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0084, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0086, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0020, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0110, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0100, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0122, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0019, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0136, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0015, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0022, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0119, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0110, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0086, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0132, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0021, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0110, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0133, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0114, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0015, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0089, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0015, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0012, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0121, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0012, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0016, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0126, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0099, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0093, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0118, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0162, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0095, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0124, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0006, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0085, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0105, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0099, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0017, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0111, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0098, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0007, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0176, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0007, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0124, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0009, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0152, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0110, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0006, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0008, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0110, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0019, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0015, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0014, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0084, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0102, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0017, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0107, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0152, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0084, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0024, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0015, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0104, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0003, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0022, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0004, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0122, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0086, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0109, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0008, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0023, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0118, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0105, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0110, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0024, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0009, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0007, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0017, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0020, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0023, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0099, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0095, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0108, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0122, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0087, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0127, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0018, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0129, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0008, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0123, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0300, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0096, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0016, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0096, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0007, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0001, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0098, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0022, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0102, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0014, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0123, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0106, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0001, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0023, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0013, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0111, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0087, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0004, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0023, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0082, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0024, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0018, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0116, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0083, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0009, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0111, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0013, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0006, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0089, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0178, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0105, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0140, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0131, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0095, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0007, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0100, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0109, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0010, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0106, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0082, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0024, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0088, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0090, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0098, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0186, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0136, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0105, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0102, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0109, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0120, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0019, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0099, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0100, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0082, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0100, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0090, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0008, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0007, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0015, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0118, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0140, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0006, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0012, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0014, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0136, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0005, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0087, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0024, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0100, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0096, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0004, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0008, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0024, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0103, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0022, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0093, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0087, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0019, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0111, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0016, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0010, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0083, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0128, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0019, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0103, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0095, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0103, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0015, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0094, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0117, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0024, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0015, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0015, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0012, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0145, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0002, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0010, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0081, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0088, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0006, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0007, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0021, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0023, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0017, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0017, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0088, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0110, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0018, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0138, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0122, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0010, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0125, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0001, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0088, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0143, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0097, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0114, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0016, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0010, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0109, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0141, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0092, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0023, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0019, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0105, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0004, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0012, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0087, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0116, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0010, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0015, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0007, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0087, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0010, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0002, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0013, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0006, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0001, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0017, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0124, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0099, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0131, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0019, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0093, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0107, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0009, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0082, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0011, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0008, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0017, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0009, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0130, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0019, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0024, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0094, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0012, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0088, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0002, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0098, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0015, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0128, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0101, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0013, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0117, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0080, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0021, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0116, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0009, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0132, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0164, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0093, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0022, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0006, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0017, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0012, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0081, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0005, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0091, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0105, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0010, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0012, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0093, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0091, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0090, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0083, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0087, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0104, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0010, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0155, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0023, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0012, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0101, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0023, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0021, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0116, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0108, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0014, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0097, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0083, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0084, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0007, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0016, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0088, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0021, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0205, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0023, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0009, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0109, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0167, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0019, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0007, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0082, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0024, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0021, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0009, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0010, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(6.1850e-05, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0113, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0016, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0106, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0113, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0101, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0013, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0024, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0012, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(9.3210e-05, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(6.4228e-05, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0005, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0118, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0090, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0088, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0147, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0006, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0090, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0006, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0101, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0082, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0015, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0084, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0011, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0080, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0002, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0092, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0081, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0016, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0108, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0135, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0004, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0018, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0017, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0132, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0022, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0134, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0022, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0114, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0016, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0085, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0014, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0085, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0014, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0021, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0084, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0013, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0089, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0095, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0020, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0080, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0021, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0017, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0091, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0017, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0019, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0015, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0018, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0120, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0089, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0137, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0123, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0113, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0091, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0088, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0112, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0155, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0003, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0018, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0119, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0004, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0023, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0117, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0100, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0017, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0205, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0002, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0116, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0007, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0013, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0018, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0097, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0086, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0145, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0011, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0010, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0094, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0013, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0094, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0022, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0093, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0179, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0022, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0142, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0196, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0114, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0081, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0007, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0009, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0022, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0117, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0019, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0022, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0081, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0107, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0024, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0082, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0005, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0095, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0006, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0159, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0014, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0015, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0024, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0110, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0019, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0083, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0018, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0091, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0018, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0022, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0023, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0126, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0080, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0105, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0099, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0013, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0022, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0101, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0134, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0111, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0009, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0016, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0018, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0015, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0011, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0082, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0023, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0020, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0017, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0013, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0008, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0008, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0117, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0099, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0089, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0004, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0022, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0024, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0082, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0008, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0002, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0083, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0013, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0085, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0099, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0007, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0115, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0083, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0019, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0105, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0082, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0024, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0080, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0091, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0106, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0091, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0011, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0012, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0019, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0021, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0081, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0121, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0020, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0008, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0016, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0018, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0120, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0131, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0099, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0015, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0021, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0016, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0098, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0014, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0009, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0089, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0019, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0017, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0088, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0013, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0007, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0008, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0094, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0020, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0013, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0012, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0012, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0014, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0010, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0023, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0088, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0111, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0013, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0108, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0012, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0023, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0021, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0010, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0011, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0006, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0017, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0094, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0021, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0019, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0012, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0005, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0005, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0013, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0169, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0020, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0112, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0086, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0080, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0080, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0011, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0127, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(9.3342e-05, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0132, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0172, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0016, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0017, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0090, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0002, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0023, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0107, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0019, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0013, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0096, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0156, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0019, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0017, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0094, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0092, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0171, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0133, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0170, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0138, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0124, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0081, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0089, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0177, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0013, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0124, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0105, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0084, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0016, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0016, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0020, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0019, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0127, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0087, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0085, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0102, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0109, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0101, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0020, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0091, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0018, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0006, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0007, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0097, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0088, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0087, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0018, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0013, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0016, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0135, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0004, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0102, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0008, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0091, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0016, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0023, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0087, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0017, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0010, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0015, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0102, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0007, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0001, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0124, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0003, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0095, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0147, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0019, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0002, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0084, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0109, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0087, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0003, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0105, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0107, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0094, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0006, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0080, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0024, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0009, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0023, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0022, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0009, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0018, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0080, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0113, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0175, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0082, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0006, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0151, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0016, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0091, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0016, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0088, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0119, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0081, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0015, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0089, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0024, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0081, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0131, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0020, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0017, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0008, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0018, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0023, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0120, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0010, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0006, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0024, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0136, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0022, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0003, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0021, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0013, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0021, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0108, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0104, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0011, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0015, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0003, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0113, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0132, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0158, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0003, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0108, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0086, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0081, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0091, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0100, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0125, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0082, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0013, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0098, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0109, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0110, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0008, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0011, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0013, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0017, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0090, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0086, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0103, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0018, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0088, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0144, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0129, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0009, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0124, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0082, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0258, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0016, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0016, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0006, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0023, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0022, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0018, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0024, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0093, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0014, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0012, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0118, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0019, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0005, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0084, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0014, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0115, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0159, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0087, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0096, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0092, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0001, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0010, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0016, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0019, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0017, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0014, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0097, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0004, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0016, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0101, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0002, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0111, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0103, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0126, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0021, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0023, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0023, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0093, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0009, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0005, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0024, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0005, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0023, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0085, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0018, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0133, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0096, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0111, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0102, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0021, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0017, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0014, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0081, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0129, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0016, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0167, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0020, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0002, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0020, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0111, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0005, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0116, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0133, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0088, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0010, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0080, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0018, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0015, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0023, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0023, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0128, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0102, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0008, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0021, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0018, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0018, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0019, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0018, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0022, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0010, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0016, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0085, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0017, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0010, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0015, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0018, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0088, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0007, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0093, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0094, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0112, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0111, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0004, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0007, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0023, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0023, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0016, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0115, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0094, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0137, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0186, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0011, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0022, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0101, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0013, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0004, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0100, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0015, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0023, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0020, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0170, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0149, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0018, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0022, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0022, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0015, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0095, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0006, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0019, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0013, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0021, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0015, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0012, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0080, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0003, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0119, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0105, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0087, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0133, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0014, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0024, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0016, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0023, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0003, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0024, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0021, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0161, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0011, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0019, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0091, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0018, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0010, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0001, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0146, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0023, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0107, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0022, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0019, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0015, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0014, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0008, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0133, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0112, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0020, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0100, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0082, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0083, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0018, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0109, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0015, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0021, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0174, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0022, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0148, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0024, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0140, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0019, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0009, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0097, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0015, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0017, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0009, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0017, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0096, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0019, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0099, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0024, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0007, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0106, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0024, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0005, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0013, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0008, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0020, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0133, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0136, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0015, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0114, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0123, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0019, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0018, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0085, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0006, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0081, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0010, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0101, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0023, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0081, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0022, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0134, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0021, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0019, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0105, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0083, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0104, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0023, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0107, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0009, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0081, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0007, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0016, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0016, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0080, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0015, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0013, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0019, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0124, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0087, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0024, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0024, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0110, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0005, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0109, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(9.2265e-05, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0021, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0023, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0092, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0011, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0024, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0005, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0013, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0086, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0088, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0020, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0167, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0100, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0016, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0106, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0126, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0095, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0022, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0104, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0084, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0020, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0152, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0008, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0129, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0005, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0112, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0015, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0102, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0082, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0090, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0153, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0016, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0001, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0018, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0081, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0012, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0103, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0156, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0099, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0122, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0019, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0085, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0012, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0150, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0108, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0018, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0096, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0012, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0122, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0102, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0115, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0017, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0019, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0095, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0086, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0174, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0081, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0012, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0122, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0019, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0013, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0094, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0098, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0014, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0009, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0208, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0086, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0112, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0083, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0084, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0102, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0019, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0100, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0105, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0101, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0139, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0007, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0001, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0119, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0022, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0007, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0024, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0115, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0138, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0082, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0020, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0116, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0093, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0118, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0087, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0024, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0087, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0011, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0087, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0020, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0121, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0109, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0017, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0165, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0099, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0135, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0017, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0121, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0014, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0082, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0119, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0022, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0015, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0014, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0099, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0080, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0098, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0092, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0022, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0003, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0021, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0010, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0017, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0083, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0006, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0009, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0086, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0024, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0147, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0082, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0080, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0091, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0018, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0092, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0020, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0023, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0103, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0107, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0016, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0088, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0098, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0008, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0022, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0012, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0008, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0082, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0024, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0082, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0096, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0009, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0004, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0006, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0002, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0013, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0008, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0024, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0126, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0008, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0014, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0139, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0124, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0005, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0004, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0024, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0019, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0011, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0121, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0024, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0013, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0023, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0012, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0112, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0112, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0019, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0006, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0107, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0006, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0120, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0103, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0023, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0117, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0139, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0002, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0095, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0091, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0101, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0013, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0089, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0007, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0023, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0020, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0113, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0022, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0020, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0016, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0094, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0022, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0023, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0024, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0093, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0014, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0134, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0019, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0116, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0016, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0023, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0020, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0007, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0019, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0008, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0014, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0013, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0024, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0007, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0081, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0018, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0013, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0020, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0013, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0010, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0018, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0096, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0145, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0023, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0013, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0004, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0091, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0014, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0003, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0017, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0083, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0097, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0004, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0108, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0096, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0084, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0020, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0098, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0001, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0016, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0004, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0130, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0092, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0022, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0203, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0134, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0002, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0010, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0159, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0023, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0181, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0081, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0017, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0084, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0020, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0020, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0023, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0103, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0167, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0017, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0096, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0009, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0019, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0080, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0019, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0020, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0100, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0019, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0008, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0093, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0012, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0114, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0024, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0009, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0022, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0004, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0021, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0088, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0021, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0008, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0010, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0016, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0016, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0104, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0006, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0017, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0012, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0081, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0086, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(6.9775e-05, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0009, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0111, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0022, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0016, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0107, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0015, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0022, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0020, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0100, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0017, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0005, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0146, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0017, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0098, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0105, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0021, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0016, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0108, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0011, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0022, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0096, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0012, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0017, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0019, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0013, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0022, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0083, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0011, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0024, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0010, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0132, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0013, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0005, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0166, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0097, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0088, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0018, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0024, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0009, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0005, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0017, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0013, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0024, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0086, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0128, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0018, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0117, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0008, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0007, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0013, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0095, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0092, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0101, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0082, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0094, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0019, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0084, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0024, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0099, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0092, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0017, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0098, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0096, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0020, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0017, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0017, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0091, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0017, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0081, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0021, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0015, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0162, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0155, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0022, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0135, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(8.9360e-05, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0094, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0018, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0083, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0003, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0020, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0090, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0111, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0012, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0014, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0083, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0108, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0099, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0016, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0107, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0108, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0004, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0104, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0017, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0117, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0080, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0110, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0016, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0011, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0006, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0134, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0090, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0020, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0084, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0008, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0089, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0010, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0122, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0018, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0017, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0009, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(9.6922e-05, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0018, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0010, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0002, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0100, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0094, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0010, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0007, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0115, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0101, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0014, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0024, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0107, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0008, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0122, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0012, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0019, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0013, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0095, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0096, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0003, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0020, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0090, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0111, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0114, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0012, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0114, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0006, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0084, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0014, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0021, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0092, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0091, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0004, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0134, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0022, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0023, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0084, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0010, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0013, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0021, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0124, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0024, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0021, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0117, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0008, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0006, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0018, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0096, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0018, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0024, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0107, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0094, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0080, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0008, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0150, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0089, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0016, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0013, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0019, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0185, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0097, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0096, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0021, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0136, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0020, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0006, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0004, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0018, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0021, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0001, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0163, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0002, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0017, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0010, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0123, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0023, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0008, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0018, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0084, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0013, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0022, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0020, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0126, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0002, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0105, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0115, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0094, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0091, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0124, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0016, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0087, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0013, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0149, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0008, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0018, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0015, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0002, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0141, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0104, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0084, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0015, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0024, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0103, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0101, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0017, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0090, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0083, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0016, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0093, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0020, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0089, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0022, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0007, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0106, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0022, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0022, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0015, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0011, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0021, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0091, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0011, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0009, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0003, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0008, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0015, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0001, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0015, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0098, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0095, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0158, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0017, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0011, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0095, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0003, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0139, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0020, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0007, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0015, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0092, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0001, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0098, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0017, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0017, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0107, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0012, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0017, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0086, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0021, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0164, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0008, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0127, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0012, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0011, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0161, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0108, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0018, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0087, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0017, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0089, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0148, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0024, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0081, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0096, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0085, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0011, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0020, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0019, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0012, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0091, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0080, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0153, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0093, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0081, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0012, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0014, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0100, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0096, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0019, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0101, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0024, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0093, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0018, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0100, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0016, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0013, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0001, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0110, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0017, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(5.5993e-05, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0104, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0019, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0018, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0124, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0013, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0012, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0023, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0013, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0017, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0156, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0020, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0022, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0019, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0080, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0006, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0013, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(5.6183e-05, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0002, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0021, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0117, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0017, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0018, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0108, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0135, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0132, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0017, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0081, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0021, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0014, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0010, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0003, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0105, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0109, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(5.8117e-05, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0018, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0169, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0195, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0011, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0004, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0013, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0003, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0109, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0084, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0003, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(6.0479e-05, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0024, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0020, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0006, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(6.1825e-05, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0016, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0019, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0018, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0089, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0023, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0021, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(5.5101e-05, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0080, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0019, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0014, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0088, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0016, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0003, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0116, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0016, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0020, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0016, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0095, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0012, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0084, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0116, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0008, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0014, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0118, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0006, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0014, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0017, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0003, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0022, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0119, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0010, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0016, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0090, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0019, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0082, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0110, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0103, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0018, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0095, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0011, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0022, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0018, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0013, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0084, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0130, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0019, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0084, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0002, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0114, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0099, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0001, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0007, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0021, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0145, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0084, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0082, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0009, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0016, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0093, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0118, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0017, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0020, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0017, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0022, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0113, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0015, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0016, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0013, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0022, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0102, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0017, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0014, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0010, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0091, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0023, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0002, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0087, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0018, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0023, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0021, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0088, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0084, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0108, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0008, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0012, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0020, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0014, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0126, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0133, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0006, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0014, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0125, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0006, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0086, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0021, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0012, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0009, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0018, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0123, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0020, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(8.7150e-05, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0024, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0095, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0082, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0018, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0097, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0085, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0007, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0104, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0019, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0081, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0013, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0009, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0086, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0009, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0017, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0082, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0082, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0018, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0082, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0133, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0098, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0021, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0087, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0021, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0022, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0020, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0014, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0021, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0019, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0088, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0019, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0090, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0013, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0010, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0011, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0097, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0085, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0094, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0004, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0130, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0023, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0095, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0112, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0104, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0089, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0082, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0097, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0024, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0137, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0015, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0167, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0085, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0092, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0118, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0109, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0117, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0021, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0003, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0024, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0016, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0096, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0020, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0021, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0023, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0018, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0118, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0099, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0022, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0012, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0011, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0004, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0023, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0086, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0023, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0151, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0101, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0016, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0097, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0100, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0022, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0086, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0104, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0153, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0087, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0101, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0102, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0017, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0023, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0088, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0022, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0021, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0084, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0138, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0116, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0006, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0013, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0017, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0090, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0105, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0006, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0102, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0117, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0125, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0115, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0009, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0088, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0087, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0088, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0018, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0096, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0142, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0013, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0085, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0024, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0024, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0099, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0084, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0019, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0010, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0020, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0017, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0018, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0082, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0148, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0081, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0024, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0014, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0013, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0090, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0008, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0086, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0101, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0010, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0010, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0084, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0084, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0088, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0021, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0021, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0010, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0084, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0016, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0094, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0015, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0014, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0104, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0088, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0098, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0080, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0018, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0020, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0099, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0023, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0092, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0111, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0087, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0100, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0013, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0094, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0020, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0090, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0080, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0007, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0022, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0015, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0133, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0086, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0095, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0019, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0015, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0087, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0147, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0091, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0100, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0012, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0080, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0110, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0112, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0015, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0096, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0096, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0014, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0020, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0083, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0121, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0093, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0016, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0023, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0166, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0083, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0023, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0020, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0080, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0020, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0014, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0086, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0006, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0136, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0109, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0003, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0011, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0100, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0085, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0089, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0100, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0092, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0023, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0098, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0009, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0011, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0009, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0018, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0022, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0088, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0085, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0085, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0022, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0085, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0018, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0124, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0121, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0093, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0108, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0097, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0084, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0022, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0007, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0176, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0086, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0115, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0151, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0024, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0092, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0019, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0110, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0092, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0080, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0013, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0013, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0088, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0099, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0118, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0108, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0103, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0084, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0087, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0095, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0023, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0081, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0017, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0013, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0217, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0125, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0080, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0090, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0122, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0089, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0101, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0015, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0089, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0018, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0007, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0145, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0003, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0019, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0093, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0103, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0119, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0124, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0008, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0118, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0080, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0097, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0017, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0024, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0024, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0123, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0119, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0013, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0021, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0019, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0019, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0024, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0012, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0020, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0080, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0104, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0024, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0135, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0022, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0014, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0108, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0023, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0020, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0024, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0101, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0106, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0013, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0092, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0023, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0098, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0017, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0081, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0006, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0093, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0017, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0139, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0085, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0109, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0021, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0015, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0011, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0014, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0085, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0095, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0081, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0024, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0142, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0020, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0085, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0014, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0019, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0093, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0090, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0083, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0022, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0105, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0104, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0116, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0109, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0014, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0083, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0016, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0080, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0012, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0119, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0105, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0110, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0022, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0023, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0013, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0135, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0087, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0009, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0087, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0013, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0165, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0012, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0112, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0015, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0011, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0102, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0023, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0085, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0080, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0093, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0006, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0122, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0008, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0023, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0082, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0091, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0008, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0101, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0021, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0085, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0015, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0020, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0021, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0021, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0097, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0091, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0087, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0113, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0020, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0016, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0018, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0086, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0100, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0022, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0019, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0087, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0010, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0014, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0089, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0085, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0019, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0022, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0019, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0157, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0016, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0094, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0006, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0110, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0016, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0020, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0084, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0138, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0137, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0022, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0008, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0022, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0012, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0132, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0004, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0106, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0115, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0023, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0010, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0021, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0124, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0091, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0015, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0010, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0129, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0023, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0082, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0021, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0023, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0017, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0082, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0020, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0112, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0014, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0019, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0022, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0107, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0004, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0102, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0008, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0085, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0023, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0086, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0120, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0009, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0080, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0100, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0090, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0103, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0017, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0085, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0087, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0099, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0133, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0081, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0015, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0082, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0004, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0018, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0024, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0101, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0123, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0136, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0020, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0085, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0095, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0093, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0108, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0130, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0107, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0096, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0082, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0018, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0125, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0094, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0085, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0138, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0021, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0085, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0005, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0007, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0021, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0022, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0021, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0022, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0019, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0016, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0082, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0017, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0098, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0101, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0152, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0018, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0009, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0095, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0021, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0015, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0125, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0091, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0095, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0020, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0080, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0087, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0008, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0080, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0133, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0023, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0020, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0134, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0009, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0124, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0158, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0014, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0004, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0108, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0107, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0022, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0006, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0015, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0020, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0024, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0022, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0023, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0014, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0020, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0022, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0101, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0088, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0021, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0111, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0100, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0002, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0022, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0148, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0017, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0016, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0090, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0094, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0019, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0014, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0129, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0004, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0021, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0015, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0127, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0024, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0257, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0023, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0023, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0021, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0111, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0024, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0016, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0110, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0016, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0108, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0011, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0102, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0095, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0108, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0019, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0124, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0085, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0021, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0090, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0082, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0100, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0014, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0014, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0019, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0014, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0024, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0007, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0016, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0014, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0023, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0018, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0097, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0012, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0020, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0024, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0017, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0098, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0010, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0017, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0007, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0015, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0007, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0086, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0017, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0108, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0021, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0088, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0081, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0013, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0020, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0019, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0018, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0019, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0012, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0015, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0149, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0014, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0019, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0101, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0020, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0015, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0010, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0014, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0022, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0084, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0016, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0109, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0109, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0084, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0024, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0021, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0016, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0013, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0022, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0008, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0131, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0015, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0022, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0022, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0004, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0008, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0105, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0024, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0008, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0096, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0092, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0118, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0016, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0111, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0082, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0138, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0101, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0018, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0082, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0092, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0023, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0022, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0120, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0015, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0013, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0091, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0093, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0021, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0022, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0024, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0087, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0016, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0082, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0086, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0006, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0109, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0020, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0013, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0084, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0019, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0013, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0147, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0104, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0017, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0094, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0023, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0023, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0022, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0082, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0144, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0019, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0019, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0022, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0024, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0020, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0104, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0023, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0106, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0082, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0024, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0014, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0011, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0117, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0082, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0113, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0007, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0014, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0107, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0024, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0085, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0022, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0097, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0014, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0082, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0080, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0117, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0015, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0104, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0089, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0103, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0024, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0009, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0023, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0023, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0091, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0097, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0020, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0014, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0014, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0021, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0024, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0018, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0111, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0113, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0081, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0017, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0008, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0023, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0023, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0007, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0083, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0019, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0090, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0007, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0114, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0015, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0101, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0106, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0008, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0018, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0093, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0011, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0019, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0145, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0022, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0013, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0111, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0113, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0090, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0024, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0088, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0012, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0090, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0092, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0020, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0096, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0086, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0082, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0024, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0016, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0090, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0023, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0015, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0017, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0018, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0010, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0018, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0021, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0090, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0022, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0093, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0024, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0011, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0124, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0080, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0005, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0094, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0103, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0163, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0015, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0015, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0022, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0013, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0004, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0016, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0022, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0015, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0018, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0094, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0084, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0121, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0085, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0081, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0097, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0085, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0080, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0024, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0004, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0017, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0023, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0102, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0081, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0138, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0017, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0080, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0098, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0024, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0086, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0015, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0007, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0017, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0018, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0091, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0084, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0013, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0011, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0087, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0013, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0096, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0108, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0085, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0018, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0022, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0012, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0085, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0089, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0009, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0087, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0084, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0017, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0103, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0081, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0017, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0006, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0022, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0082, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0023, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0098, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0009, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0086, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0022, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0085, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0120, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0020, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0086, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0014, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0102, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0014, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0022, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0098, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0126, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0020, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0009, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0090, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0023, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0006, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0017, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0092, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0094, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0082, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0005, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0022, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0022, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0016, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0094, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0103, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0021, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0080, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0100, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0019, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0096, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0018, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0013, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0102, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0086, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0122, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0024, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0155, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0133, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0088, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0115, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0022, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0022, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0081, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0111, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0097, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0099, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0098, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0113, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0111, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0105, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0129, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0019, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0084, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0082, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0010, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0117, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0080, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0022, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0023, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0089, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0014, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0081, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0024, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0108, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0114, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0090, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0085, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0019, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0017, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0098, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0087, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0097, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0081, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0019, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0088, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0106, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0083, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0012, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0018, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0116, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0012, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0083, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0018, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0009, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0100, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0086, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0012, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0018, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0091, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0082, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0105, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0084, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0024, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0014, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0015, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0021, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0124, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0137, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0023, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0015, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0109, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0024, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0020, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0020, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0082, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0018, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0016, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0024, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0018, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0081, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0017, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0022, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0024, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0021, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0018, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0016, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0116, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0147, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0022, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0014, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0020, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0009, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0023, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0096, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0082, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0020, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0009, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0082, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0094, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0113, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0084, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0010, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0080, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0085, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0107, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0095, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0098, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0082, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0012, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0098, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0020, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0023, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0106, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0099, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0087, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0004, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0083, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0102, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0098, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0023, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0100, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0081, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0194, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0016, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0108, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0014, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0008, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0014, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0092, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0016, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0080, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0082, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0020, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0115, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0018, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0010, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0020, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0144, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0010, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0022, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0139, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0094, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0015, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0189, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0081, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0083, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0023, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0082, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0020, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0011, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0010, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0019, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0094, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0082, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0096, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0083, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0022, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0080, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0018, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0116, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0020, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0004, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0017, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0012, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0022, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0017, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0092, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0022, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0015, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0006, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0095, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0089, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0125, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0089, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0010, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0103, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0012, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0112, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0019, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0021, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0017, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0011, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0015, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0089, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0012, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0115, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0022, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0019, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0016, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0024, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0013, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0090, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0091, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0080, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0013, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0018, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0021, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0024, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0096, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0023, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0012, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0094, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0011, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0021, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0020, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0084, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0011, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0015, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0103, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0015, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0019, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0012, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0014, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0015, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0019, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0008, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0013, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0018, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0018, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0099, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0103, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0018, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0119, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0080, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0023, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0018, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0015, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0083, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0017, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0094, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0138, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0096, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0087, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0023, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0015, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0023, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0022, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0082, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0080, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0139, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0024, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0088, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0024, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0019, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0016, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0024, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0092, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0020, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0020, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0020, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0147, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0020, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0086, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0106, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0096, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0006, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0023, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0017, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0097, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0015, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0017, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0018, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0013, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0017, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0022, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0012, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0018, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0110, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0019, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0018, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0081, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0024, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0019, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0014, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0091, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0096, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0085, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0012, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0085, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0006, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0018, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0082, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0140, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0023, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0009, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0089, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0021, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0087, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0023, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0022, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0021, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0022, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0010, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0144, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0094, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0023, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0099, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0010, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0096, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0023, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0155, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0018, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0023, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0022, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0016, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0024, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0024, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0120, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0100, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0009, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0116, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0089, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0016, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0012, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0010, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0004, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0106, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0016, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0017, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0024, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0090, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0117, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0102, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0020, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0022, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0017, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0013, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0008, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0020, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0018, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0019, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0020, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0080, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0081, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0015, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0024, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0020, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0016, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0080, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0014, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0018, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0019, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0024, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0081, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0089, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0021, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0014, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0019, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0095, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0117, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0023, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0093, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0101, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0018, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0080, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0080, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0082, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0019, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0011, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0010, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0024, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0008, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0016, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0097, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0007, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0015, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0017, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0020, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0022, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0013, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0090, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0086, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0014, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0087, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0100, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0123, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0137, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0014, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0021, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0104, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0083, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0024, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0082, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0017, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0023, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0013, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0020, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0008, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0094, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0017, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0102, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0022, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0081, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0111, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0023, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0088, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0141, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0023, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0009, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0088, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0099, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0080, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0088, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0012, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0020, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0014, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0090, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0094, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0112, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0015, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0149, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0008, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0023, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0100, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0018, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0022, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0113, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0022, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0009, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0085, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0022, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0100, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0021, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0102, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0023, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0096, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0023, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0140, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0129, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0020, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0103, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0015, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0092, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0013, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0017, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0018, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0015, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0148, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0114, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0024, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0023, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0020, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0100, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0112, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0146, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0087, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0017, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0115, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0024, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0095, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0080, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0015, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0088, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0014, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0023, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0013, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0018, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0023, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0088, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0013, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0023, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0017, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0021, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0024, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0113, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0019, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0092, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0017, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0080, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0022, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0023, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0015, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0015, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0003, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0014, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0018, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0011, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0023, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0021, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0080, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0086, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0016, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0084, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0093, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0020, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0012, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0013, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0104, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0018, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0095, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0019, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0023, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0083, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0007, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0022, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0021, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0016, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0018, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0023, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0124, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0023, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0085, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0083, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0018, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0095, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0015, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0022, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0015, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0013, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0090, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0016, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0108, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0024, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0015, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0016, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0020, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0022, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0019, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0011, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0023, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0024, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0101, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0010, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0094, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0018, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0016, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0019, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0014, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0084, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0016, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0020, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0114, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0017, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0012, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0006, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0132, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0007, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0012, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0020, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0128, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0024, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0024, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0012, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0011, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0022, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0021, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0123, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0020, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0012, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0012, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0008, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0105, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0082, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0094, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0115, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0081, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0021, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0017, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0010, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0021, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0013, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0010, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0008, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0012, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0017, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0084, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0024, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0008, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0023, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0084, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0019, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0021, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0024, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0024, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0019, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0090, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0023, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0024, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0088, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0023, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0022, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0023, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0022, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0022, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0082, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0010, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0016, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0021, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0024, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0080, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0016, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0021, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0015, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0013, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0009, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0110, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0155, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0005, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0021, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0009, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0016, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0019, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0021, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0010, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0024, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0023, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0017, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0018, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0087, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0087, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0021, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0009, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0095, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0013, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0139, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0024, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0120, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0004, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0080, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0096, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0023, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0014, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0019, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0107, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0011, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0016, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0023, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0105, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0087, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0014, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0015, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0008, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0013, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0020, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0022, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0014, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0082, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0084, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0083, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0023, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0134, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0017, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0009, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0020, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0011, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0082, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0016, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0012, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0024, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0005, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0019, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0017, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0009, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0087, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0019, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0015, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0021, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0022, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0004, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0012, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0007, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0008, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0019, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0088, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0094, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0103, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0022, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0022, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0022, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0024, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0104, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0098, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0004, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0092, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0008, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0010, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0016, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0082, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0017, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0012, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0099, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0010, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0023, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0021, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0113, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0019, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0087, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0015, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0012, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0012, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0102, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0125, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0085, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0010, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0023, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0010, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0117, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0003, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0022, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0099, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0010, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0007, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0014, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0013, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0010, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0100, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0114, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0100, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0008, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0123, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0009, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0023, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0160, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0080, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0018, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0021, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0094, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0008, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0013, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0021, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0023, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0022, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0023, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0140, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0010, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0015, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0023, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0024, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0012, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0010, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0021, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0021, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0097, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0019, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0018, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0013, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0105, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0024, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0014, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0114, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0023, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0094, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0016, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0023, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0012, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0111, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0021, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0016, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0012, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0085, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0020, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0201, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0088, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0022, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0015, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0090, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0192, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0020, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0020, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0085, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0087, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0017, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0018, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0105, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0086, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0021, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0105, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0012, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0018, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0100, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0010, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0018, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0024, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0008, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0018, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0006, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0023, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0013, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0081, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0008, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0012, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0019, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0021, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0011, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0015, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0092, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0013, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0024, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0084, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0016, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0024, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0113, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0017, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0022, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0009, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0015, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0110, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0021, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0015, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0124, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0018, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0017, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0081, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0023, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0024, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0021, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0090, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0131, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0009, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0019, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0020, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0099, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0005, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0109, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0094, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0006, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0104, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0081, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0020, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0020, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0089, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0091, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0011, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0008, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0114, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0086, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0014, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0007, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0151, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0081, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0014, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0103, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0085, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0113, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0008, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0014, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0009, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0020, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0022, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0088, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0021, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0008, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0118, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0093, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0115, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0016, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0147, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0018, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0018, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0008, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0009, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0017, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0139, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0021, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0001, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0019, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0023, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0010, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0012, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0022, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0004, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0010, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0016, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0080, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0092, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0104, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0012, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0080, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0092, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0024, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0016, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0003, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0010, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0092, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0010, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0020, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0023, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0012, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0004, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0016, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0014, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0022, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0013, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0021, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0092, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0019, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0011, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0014, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0091, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0019, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0021, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0013, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0004, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0021, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0023, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0081, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0020, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0024, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0024, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0015, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0015, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0009, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0102, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0008, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0017, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0019, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0007, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0083, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0009, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0023, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0102, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0010, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0086, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0011, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0082, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0022, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0020, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0012, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0012, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0095, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0010, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0171, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0014, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0082, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0014, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0086, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0020, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0089, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0024, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0019, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0024, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0010, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0014, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0024, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0020, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0013, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0023, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0009, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0020, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0024, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0018, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0022, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0007, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0019, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0011, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0020, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0008, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0132, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0018, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0132, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0094, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0023, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0024, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0011, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0082, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0012, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0022, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0018, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0020, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0016, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0118, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0022, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0020, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0110, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0024, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0122, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0096, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0131, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0019, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0015, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0101, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0152, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0005, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0020, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0089, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0024, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0019, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0091, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0099, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0006, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0015, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0115, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0159, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0024, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0020, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0018, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0009, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0011, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0014, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0014, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0022, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0091, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0091, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0022, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0020, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0090, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0095, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0004, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0012, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0012, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0093, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0020, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0087, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0023, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0012, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0016, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0011, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0011, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0085, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0087, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0089, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0020, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0095, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0021, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0095, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0020, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0022, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0023, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0014, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0020, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0104, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0022, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0016, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0012, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0089, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0020, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0012, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0016, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0022, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0021, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0013, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0095, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0086, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0012, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0012, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0010, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0022, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0088, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0012, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0018, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0013, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0007, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0023, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0022, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0022, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0008, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0016, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0089, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0022, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0006, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0008, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0013, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0010, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0088, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0024, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0081, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0010, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0024, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0018, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0020, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0144, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0135, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0019, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0017, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0017, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0010, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0018, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0014, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0020, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0105, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0013, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0006, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0018, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0100, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0024, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0010, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0020, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0007, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0082, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0017, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0024, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0110, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0080, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0011, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0080, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0150, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0083, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0020, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0091, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0015, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0016, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0021, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0011, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0084, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0013, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0010, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0017, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0023, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0088, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0009, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0017, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0097, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0017, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0012, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0018, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0082, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0009, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0008, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0014, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0012, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0004, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0085, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0008, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0023, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0015, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0013, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0022, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0093, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0018, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0019, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0016, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0022, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0015, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0089, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0089, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0023, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0015, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0014, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0099, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0008, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0011, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0083, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0011, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0021, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0021, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0023, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0008, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0010, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0012, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0088, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0016, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0020, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0003, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0022, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0080, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0004, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0017, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0021, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0011, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0011, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0018, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0011, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0022, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0013, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0133, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0014, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0019, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0082, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0022, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0114, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0021, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0011, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0012, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0016, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0017, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0015, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0019, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0009, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0102, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0083, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0009, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0128, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0006, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0012, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0085, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0016, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0008, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0020, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0018, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0013, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0015, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0023, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0024, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0080, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0081, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0023, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0125, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0020, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0093, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0090, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0087, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0089, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0092, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0014, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0012, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0082, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0094, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0012, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0007, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0087, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0013, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0097, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0106, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0024, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0020, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0015, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0011, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0010, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0022, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0014, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0090, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0080, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0010, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0089, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0115, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0017, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0023, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0023, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0022, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0024, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0013, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0022, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0021, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0012, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0009, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0021, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0004, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0023, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0080, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0144, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0082, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0102, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0009, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0015, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0082, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0014, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0004, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0021, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0013, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0007, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0011, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0009, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0023, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0095, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0082, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0020, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0018, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0007, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0099, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0114, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0124, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0013, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0086, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0123, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0015, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0011, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0105, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0017, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0024, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0082, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0022, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0018, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0019, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0024, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0021, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0124, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0016, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0128, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0021, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0021, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0012, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0014, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0105, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0017, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0084, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0089, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0082, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0013, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0009, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0104, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0022, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0019, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0021, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0016, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0012, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0082, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0148, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0015, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0100, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0095, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0024, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0093, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0024, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0020, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0019, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0091, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0082, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0018, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0096, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0166, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0093, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0021, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0017, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0013, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0020, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0089, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0015, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0082, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0012, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0113, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0022, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0082, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0098, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0114, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0092, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0020, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0016, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0093, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0009, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0024, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0024, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0014, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0023, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0080, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0098, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0087, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0094, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0019, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0080, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0081, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0081, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0012, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0022, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0021, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0024, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0022, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0006, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0016, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0006, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0021, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0016, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0018, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0011, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0015, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0023, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0022, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0095, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0117, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0008, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0023, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0098, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0023, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0006, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0009, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0015, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0096, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0024, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0085, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0022, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0020, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0019, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0012, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0104, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0024, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0021, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0081, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0085, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0097, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0024, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0101, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0019, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0023, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0010, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0013, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0083, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0022, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0016, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0020, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0019, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0015, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0021, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0020, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0012, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0016, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0019, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0013, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0011, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0098, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0013, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0024, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0010, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0016, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0090, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0083, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0085, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0110, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0022, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0085, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0024, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0017, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0080, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0019, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0019, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0015, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0014, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0014, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0022, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0112, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0016, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0010, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0089, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0008, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0097, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0091, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0019, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0009, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0109, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0019, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0085, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0024, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0122, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0115, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0015, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0017, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0011, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0022, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0016, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0085, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0021, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0111, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0009, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0017, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0093, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0081, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0021, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0114, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0016, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0018, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0091, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0089, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0017, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0081, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0023, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0018, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0096, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0016, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0097, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0090, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0009, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0096, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0105, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0083, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0018, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0023, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0019, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0105, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0019, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0022, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0016, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0118, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0022, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0105, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0021, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0017, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0093, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0021, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0012, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0021, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0020, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0023, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0019, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0083, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0105, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0014, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0024, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0022, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0091, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0119, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0080, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0024, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0024, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0126, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0094, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0020, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0013, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0023, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0019, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0019, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0010, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0007, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0021, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0021, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0015, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0012, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0019, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0101, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0080, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0011, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0088, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0085, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0023, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0008, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0098, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0022, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0084, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0123, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0080, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0084, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0018, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0023, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0014, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0018, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0024, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0015, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0098, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0020, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0082, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0012, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0083, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0017, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0021, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0015, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0148, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0021, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0020, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0015, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0004, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0011, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0015, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0014, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0006, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0023, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0022, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0104, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0023, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0087, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0016, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0024, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0089, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0088, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0016, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0118, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0021, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0021, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0133, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0088, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0009, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0020, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0091, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0143, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0016, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0022, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0086, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0101, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0110, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0020, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0017, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0020, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0130, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0008, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0098, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0107, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0080, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0020, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0024, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0103, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0082, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0087, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0010, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0012, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0103, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0024, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0017, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0134, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0091, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0023, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0117, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0014, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0008, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0022, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0024, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0013, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0088, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0021, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0009, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0083, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0095, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0115, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0023, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0008, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0012, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0007, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0019, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0011, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0022, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0020, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0020, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0008, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0107, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0009, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0092, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0099, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0022, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0081, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0133, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0021, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0093, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0021, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0020, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0012, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0020, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0019, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0017, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0010, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0016, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0021, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0024, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0017, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0021, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0013, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0095, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0023, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0126, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0019, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0089, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0023, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0082, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0014, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0006, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0021, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0024, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0015, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0016, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0084, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0021, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0119, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0022, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0024, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0082, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0108, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0019, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0136, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0015, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0016, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0013, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0018, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0021, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0024, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0020, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0007, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0139, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0111, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0021, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0106, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0018, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0086, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0023, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0016, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0020, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0018, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0088, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0119, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0095, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0015, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0024, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0012, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0096, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0084, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0014, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0012, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0083, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0014, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0007, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0018, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0022, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0005, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0091, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0089, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0014, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0013, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0084, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0017, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0101, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0010, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0013, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0083, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0083, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0094, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0006, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0010, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0009, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0111, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0016, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0094, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0020, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0018, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0013, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0153, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0024, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0112, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0126, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0020, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0103, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0017, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0019, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0092, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0023, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0011, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0021, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0016, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0084, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0023, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0019, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0018, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0019, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0017, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0096, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0015, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0095, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0132, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0090, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0022, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0081, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0016, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0013, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0007, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0024, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0022, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0119, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0019, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0088, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0020, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0115, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0013, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0016, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0013, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0103, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0109, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0090, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0012, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0015, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0019, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0086, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0020, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0024, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0011, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0014, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0081, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0108, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0020, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0006, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0021, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0020, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0023, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0018, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0019, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0007, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0018, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0006, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0080, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0084, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0015, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0017, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0017, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0022, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0017, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0024, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0013, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0015, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0023, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0011, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0008, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0014, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0017, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0094, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0015, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0024, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0024, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0126, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0019, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0017, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0008, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0008, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0007, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0017, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0021, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0015, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0104, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0092, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0008, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0012, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0007, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0022, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0024, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0017, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0019, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0007, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0010, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0011, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0087, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0021, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0012, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0019, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0088, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0016, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0088, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0012, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0018, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0014, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0021, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0022, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0022, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0022, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0022, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0017, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0021, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0022, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0100, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0016, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0011, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0015, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0089, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0024, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0021, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0015, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0017, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0102, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0013, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0100, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0021, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0017, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0018, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0018, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0024, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0003, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0083, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0015, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0008, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0016, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0095, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0097, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0008, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0022, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0086, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0090, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0099, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0022, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0015, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0023, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0016, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0024, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0082, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0094, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0017, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0014, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0015, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0084, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0014, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0023, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0016, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0092, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0089, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0021, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0019, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0020, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0015, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0011, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0006, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0117, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0006, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0023, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0024, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0089, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0018, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0016, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0017, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0007, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0089, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0011, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0081, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0013, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0024, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0089, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0013, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0088, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0019, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0023, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0014, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0024, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0018, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0089, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0084, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0013, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0127, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0104, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0019, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0019, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0021, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0022, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0013, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0021, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0023, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0117, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0019, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0012, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0018, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0081, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0017, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0021, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0020, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0080, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0023, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0082, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0018, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0015, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0084, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0019, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0014, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0024, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0019, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0107, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0009, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0012, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0023, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0014, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0024, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0013, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0021, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0024, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0022, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0015, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0012, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0010, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0016, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0005, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0006, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0126, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0161, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0024, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0011, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0023, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0015, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0014, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0024, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0006, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0013, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0017, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0015, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0104, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0021, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0015, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0111, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0012, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0012, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0017, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0019, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0014, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0018, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0016, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0011, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0012, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0094, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0011, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0096, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0017, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0015, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0022, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0023, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0016, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0013, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0022, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0008, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0017, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0015, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0011, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0017, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0020, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0010, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0014, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0090, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0016, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0012, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0086, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0085, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0015, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0009, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0014, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0088, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0011, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0020, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0082, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0020, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0093, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0012, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0016, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0012, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0019, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0108, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0123, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0018, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0019, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0024, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0009, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0023, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0018, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0009, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0024, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0100, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0017, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0015, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0008, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0012, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0018, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0137, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0097, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0019, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0020, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0015, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0021, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0014, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0081, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0093, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0024, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0024, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0080, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0011, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0016, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0021, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0017, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0088, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0021, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0089, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0017, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0024, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0022, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0014, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0021, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0021, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0023, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0012, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0081, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0018, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0022, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0104, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0020, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0021, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0022, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0018, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0020, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0092, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0018, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0019, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0114, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0006, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0101, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0022, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0015, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0019, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0016, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0014, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0021, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0005, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0023, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0019, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0145, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0080, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0010, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0019, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0023, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0103, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0015, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0018, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0015, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0016, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0023, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0016, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0020, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0012, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0023, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0015, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0120, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0094, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0020, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0017, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0014, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0020, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0114, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0014, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0018, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0022, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0015, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0022, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0021, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0024, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0024, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0013, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0014, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0023, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0080, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0095, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0094, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0090, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0115, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0015, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0015, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0022, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0021, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0011, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0110, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0021, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0014, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0013, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0018, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0113, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0011, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0012, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0017, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0012, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0023, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0084, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0081, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0016, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0020, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0114, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0020, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0023, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0016, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0019, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0016, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0019, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0014, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0100, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0010, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0022, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0024, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0024, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0022, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0018, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0019, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0022, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0008, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0138, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0092, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0022, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0016, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0100, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0024, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0016, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0022, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0015, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0003, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0007, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0010, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0020, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0003, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0020, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0014, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0087, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0020, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0086, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0087, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0020, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0013, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0023, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0020, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0089, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0022, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0024, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0080, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0084, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0016, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0014, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0083, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0090, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0080, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0089, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0017, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0020, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0101, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0019, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0013, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0089, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0021, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0023, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0023, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0016, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0019, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0083, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0014, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0103, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0020, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0083, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0108, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0101, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0013, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0016, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0016, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0022, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0016, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0023, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0024, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0017, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0020, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0018, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0082, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0017, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0019, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0024, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0011, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0008, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0017, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0019, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0083, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0013, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0015, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0017, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0100, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0019, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0018, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0024, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0022, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0008, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0011, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0019, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0024, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0086, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0013, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0007, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0019, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0101, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0023, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0081, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0010, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0019, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0017, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0015, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0010, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0014, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0015, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0018, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0005, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0021, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0024, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0012, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0086, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0024, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0007, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0013, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0014, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0015, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0010, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0020, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0086, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0018, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0020, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0018, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0011, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0022, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0023, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0008, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0020, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0093, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0080, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0014, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0014, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0017, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0022, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0021, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0081, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0007, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0018, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0021, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0085, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0016, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0108, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0017, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0104, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0084, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0022, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0091, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0091, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0012, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0017, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0106, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0024, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0118, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0104, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0005, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0024, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0014, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0018, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0020, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0010, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0010, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0009, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0020, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0012, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0014, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0017, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0018, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0085, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0013, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0010, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0018, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0018, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0007, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0023, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0011, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0084, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0014, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0024, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0011, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0018, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0023, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0019, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0020, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0018, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0011, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0023, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0116, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0018, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0009, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0014, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0008, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0100, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0021, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0018, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0009, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0097, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0024, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0084, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0021, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0021, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0101, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0023, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0007, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0019, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0009, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0020, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0009, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0089, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0009, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0008, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0083, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0120, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0082, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0083, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0015, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0005, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0019, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0016, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0017, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0019, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0008, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0006, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0020, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0022, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0010, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0101, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0019, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0020, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0019, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0013, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0023, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0022, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0018, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0019, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0011, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0003, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0008, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0016, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0021, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0020, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0010, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0018, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0011, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0023, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0015, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0023, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0014, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0022, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0016, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0081, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0022, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0024, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0018, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0014, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0082, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0083, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0022, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0111, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0024, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0010, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0119, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0019, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0017, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0020, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0010, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0149, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0118, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0019, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0018, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0017, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0014, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0013, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0017, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0018, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0004, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0011, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0014, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0008, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0022, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0096, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0020, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0020, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0012, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0016, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0023, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0018, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0023, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0021, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0024, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0095, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0080, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0084, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0017, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0085, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0009, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0016, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0023, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0019, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0021, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0022, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0115, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0018, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0016, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0018, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0018, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0011, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0021, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0015, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0004, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0009, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0020, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0015, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0102, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0008, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0111, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0019, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0102, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0127, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0016, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0020, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0018, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0021, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0012, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0112, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0084, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0122, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0018, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0018, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0023, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0014, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0012, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0016, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0013, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0012, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0022, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0097, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0012, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0093, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0015, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0015, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0014, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0018, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0089, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0098, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0021, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0086, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0109, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0021, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0007, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0007, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0019, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0084, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0008, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0080, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0012, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0024, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0080, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0016, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0080, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0016, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0015, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0092, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0020, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0017, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0021, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0142, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0094, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0023, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0017, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0011, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0092, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0006, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0008, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0023, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0098, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0017, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0024, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0082, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0019, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0019, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0017, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0095, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0112, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0016, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0016, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0016, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0101, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0021, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0104, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0011, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0104, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0081, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0010, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0008, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0019, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0102, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0020, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0021, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0093, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0022, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0010, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0022, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0017, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0022, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0021, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0016, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0080, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0117, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0018, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0020, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0016, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0014, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0022, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0017, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0023, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0161, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0080, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0017, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0014, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0017, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0023, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0022, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0020, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0092, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0022, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0022, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0012, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0020, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0019, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0010, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0018, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0020, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0005, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0095, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0091, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0024, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0022, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0101, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0103, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0107, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0020, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0083, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0127, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0008, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0006, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0022, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0009, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0006, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0016, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0011, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0092, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0020, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0082, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0020, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0082, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0012, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0022, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0019, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0012, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0015, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0018, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0013, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0015, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0018, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0012, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0020, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0014, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0020, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0016, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0023, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0098, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0010, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0011, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0010, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0003, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0007, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0115, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0006, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0089, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0082, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0021, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0017, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0024, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0016, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0084, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0009, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0023, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0010, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0118, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0015, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0009, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0081, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0023, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0017, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0093, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0012, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0018, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0019, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0020, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0012, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0087, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0021, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0095, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0014, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0024, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0014, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0024, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0024, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0023, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0016, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0005, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0011, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0015, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0094, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0016, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0081, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0101, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0010, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0017, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0083, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0083, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0009, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0016, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0098, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0023, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0016, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0101, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0094, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0019, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0087, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0022, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0013, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0020, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0129, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0019, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0012, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0003, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0083, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0020, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0094, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0010, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0106, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0011, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0021, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0009, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0024, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0095, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0023, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0019, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0007, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0004, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0013, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0012, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0022, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0019, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0023, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0095, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0012, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0084, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0022, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0115, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0014, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0019, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0019, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0021, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0010, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0021, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0017, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0106, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0005, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0010, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0018, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0093, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0104, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0023, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0012, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0019, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0021, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0096, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0019, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0008, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0014, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0024, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0018, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0015, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0003, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0015, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0019, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0021, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0017, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0016, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0095, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0024, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0008, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0011, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0008, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0017, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0093, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0015, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0126, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0105, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0089, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0017, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0081, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0019, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0020, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0012, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0019, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0087, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0085, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0104, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0139, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0013, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0019, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0013, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0023, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0020, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0023, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0005, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0017, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0080, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0020, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0083, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0023, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0008, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0021, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0023, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0017, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0010, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0017, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0017, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0023, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0013, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0022, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0019, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0020, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0015, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0015, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0017, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0083, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0087, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0110, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0016, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0020, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0011, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0102, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0119, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0015, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0013, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0019, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0017, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0006, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0019, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0007, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0012, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0023, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0005, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0013, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0024, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0020, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0017, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0083, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0021, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0016, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0018, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0009, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0018, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0020, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0117, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0017, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0021, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0023, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0018, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0016, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0081, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0021, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0018, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0018, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0019, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0022, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0017, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0021, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0021, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0109, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0021, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0010, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0017, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0021, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0019, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0014, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0013, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0020, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0020, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0083, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0022, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0021, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0021, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0009, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0023, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0016, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0017, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0022, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0008, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0095, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0024, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0024, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0133, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0089, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0020, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0023, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0018, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0018, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0018, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0085, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0016, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0024, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0092, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0014, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0093, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0022, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0087, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0018, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0021, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0020, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0011, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0013, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0022, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0022, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0023, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0099, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0018, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0007, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0015, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0019, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0022, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0022, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0023, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0013, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0016, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0019, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0023, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0018, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0021, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0004, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0020, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0020, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0017, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0020, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0011, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0022, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0086, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0104, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0019, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0120, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0014, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0020, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0018, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0014, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0016, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0023, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0020, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0094, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0016, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0018, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0006, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0020, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0081, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0086, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0013, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0018, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0008, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0012, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0085, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0023, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0020, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0013, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0017, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0015, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0011, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0010, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0011, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0091, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0021, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0020, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0004, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0080, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0011, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0016, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0021, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0015, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0024, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0020, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0019, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0014, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0020, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0018, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0083, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0086, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0018, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0011, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0022, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0013, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0011, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0117, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0016, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0012, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0018, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0017, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0010, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0016, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0133, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0019, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0009, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0017, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0014, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0020, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0018, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0095, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0013, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0011, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0024, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0081, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0023, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0022, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0020, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0020, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0018, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0016, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0009, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0082, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0017, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0017, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0019, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0020, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0017, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0020, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0014, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0133, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0023, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0018, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0095, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0014, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0016, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0021, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0022, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0109, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0102, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0015, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0012, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0015, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0014, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0023, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0080, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0084, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0023, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0008, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0022, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0014, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0009, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0083, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0023, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0023, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0019, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0083, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0006, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0024, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0024, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0019, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0017, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0008, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0020, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0023, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0009, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0016, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0011, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0011, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0024, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0006, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0019, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0017, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0086, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0081, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0014, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0097, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0011, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0014, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0020, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0084, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0017, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0023, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0009, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0024, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0118, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0021, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0017, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0012, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0080, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0018, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0013, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0016, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0016, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0017, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0083, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0011, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0018, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0015, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0021, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0023, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0023, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0093, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0013, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0024, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0010, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0018, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0013, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0022, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0017, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0007, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0019, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0008, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0021, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0010, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0024, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0022, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0013, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0084, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0011, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0018, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0024, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0015, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0020, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0018, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0016, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0127, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0018, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0009, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0015, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0018, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0015, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0022, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0086, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0016, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0111, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0083, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0019, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0018, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0011, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0014, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0014, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0013, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0019, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0023, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0022, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0023, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0020, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0010, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0137, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0019, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0013, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0019, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0016, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0024, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0019, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0022, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0024, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0021, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0018, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0121, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0017, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0023, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0011, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0023, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0013, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0015, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0020, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0016, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0021, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0093, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0083, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0013, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0023, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0021, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0020, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0101, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0014, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0017, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0017, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0018, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0020, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0023, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0008, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0088, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0021, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0003, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0012, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0015, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0024, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0019, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0024, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0016, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0019, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0016, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0011, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0023, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0019, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0022, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0021, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0003, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0021, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0022, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0008, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0083, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0003, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0018, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0105, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0020, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0020, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0021, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0019, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0010, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0013, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0008, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0022, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0009, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0023, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0023, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0013, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0018, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0086, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0012, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0013, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0023, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0022, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0014, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0024, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0024, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0019, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0017, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0015, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0018, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0096, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0023, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0020, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0023, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0107, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0014, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0017, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0022, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0024, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0018, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0010, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0011, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0023, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0024, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0088, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0020, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0003, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0018, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0022, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0020, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0017, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0011, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0094, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0023, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0010, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0021, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0023, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0024, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0024, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0024, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0019, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0006, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0010, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0080, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0106, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0015, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0024, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0014, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0007, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0022, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0018, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0024, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0023, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0024, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0016, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0018, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0010, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0006, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0013, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0095, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0020, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0024, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0016, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0004, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0107, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0007, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0012, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0012, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0012, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0022, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0013, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0020, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0080, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0017, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0012, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0015, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0019, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0108, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0018, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0022, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0021, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0019, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0015, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0007, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0009, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0015, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0023, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0020, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0022, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0018, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0017, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0022, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0018, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0014, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0013, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0109, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0107, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0012, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0011, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0012, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0020, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0006, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0022, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0021, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0005, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0010, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0023, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0087, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0012, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0016, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0018, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(8.9735e-05, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(8.8530e-05, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(8.8530e-05, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(8.8530e-05, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(8.7212e-05, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(8.7212e-05, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(8.7212e-05, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(8.5806e-05, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(8.5806e-05, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(8.5806e-05, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(8.4340e-05, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(8.4340e-05, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(8.4340e-05, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(8.2835e-05, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(8.2835e-05, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(8.2835e-05, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(8.1313e-05, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(8.1313e-05, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0004, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0015, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0013, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0021, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0006, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0020, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0018, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0017, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0019, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0018, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0017, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0021, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0017, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0023, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0023, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0012, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0016, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0022, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0016, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0023, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0012, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0015, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0024, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0017, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0020, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0023, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0018, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0022, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0013, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0020, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0008, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0005, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0013, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0087, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0024, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0022, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0018, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0016, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0015, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0019, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0008, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0024, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0018, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0020, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0080, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0011, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0015, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0011, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0024, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0021, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0016, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0011, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0080, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0024, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0019, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0080, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0017, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0006, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0016, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0020, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0082, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0010, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0018, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0008, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0017, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0019, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0007, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0020, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0021, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0018, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0023, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0019, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0010, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0020, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0021, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0005, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0134, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0090, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0110, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0023, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0009, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0083, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0022, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0023, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0017, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0021, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0013, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0014, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0019, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0092, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0023, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0010, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0003, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0015, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0016, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0013, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0138, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0093, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0023, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0016, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0007, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0018, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0022, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0020, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0023, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0022, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0010, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0019, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0022, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0022, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0013, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0022, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0098, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0020, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0009, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0091, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0020, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0016, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0016, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0024, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0023, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0015, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0022, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0019, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0016, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0019, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0024, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0017, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0014, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0016, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0008, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0021, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0009, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0016, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0014, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0020, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0022, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0017, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0018, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0012, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0009, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0024, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0023, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0024, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0022, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0019, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0018, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0019, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0012, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0022, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0024, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0021, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0014, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0008, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0015, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0020, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0017, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0009, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0017, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0009, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0020, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0012, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0017, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0020, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0019, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0016, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0014, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0024, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0024, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0022, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0015, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0011, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0023, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0014, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0018, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0021, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0022, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0014, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0008, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0023, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0013, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0105, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0016, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0012, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0011, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0019, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0012, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0020, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0024, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0004, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0007, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0010, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0023, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0011, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0009, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0024, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0017, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0016, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0023, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0011, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0018, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0088, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0005, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0020, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0019, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0021, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0017, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0010, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0018, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0021, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0022, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0023, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0012, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0009, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0111, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0016, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0024, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0008, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0012, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0017, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0020, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0151, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0015, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0019, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0012, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0004, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0020, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0015, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0129, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0004, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0023, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0018, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0020, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0127, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0016, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0017, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0023, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0017, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0020, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0013, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0024, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0009, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0022, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0020, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0024, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0024, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0008, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0020, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0018, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0008, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0019, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0022, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0022, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0012, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0003, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0017, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0023, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0010, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0023, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0005, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0080, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0021, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0018, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0022, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0010, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0016, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0013, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0018, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0015, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0010, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0011, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0117, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0012, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0017, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0015, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0023, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0014, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0019, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0023, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0023, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0019, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0014, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0019, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0010, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0011, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0082, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0018, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0005, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0013, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0097, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0017, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0013, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0023, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0020, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0014, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0012, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0024, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0020, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0006, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0014, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0011, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0010, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0012, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0021, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0018, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0022, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0080, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0009, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0019, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0017, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0015, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0019, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0023, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0022, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0024, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0016, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0023, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0099, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0005, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0022, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0012, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0019, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0006, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0014, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0021, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0021, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0021, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0012, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0022, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0011, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0013, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0016, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0019, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0017, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0017, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0006, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0009, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0081, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0024, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0021, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0024, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0014, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0023, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0005, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0015, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0017, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0016, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0018, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0024, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0024, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0018, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0022, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0016, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0004, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0013, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0008, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0013, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0016, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0007, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0022, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0123, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0018, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0010, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0022, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0004, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0012, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0020, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0021, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0102, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0012, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0023, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0021, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0020, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0006, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0012, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0018, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0003, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0022, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0015, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0011, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0020, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0022, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0020, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0010, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0024, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0011, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0016, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0021, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0013, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0022, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0014, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0083, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0007, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0013, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0018, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0019, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0016, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0012, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0018, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0017, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0023, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0013, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0012, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0084, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0014, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0015, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0108, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0013, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0019, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0007, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0005, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0009, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0019, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0024, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0013, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0023, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0024, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0019, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0020, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0024, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0005, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0024, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0019, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0024, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0019, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0011, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0010, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0021, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0019, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0017, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0024, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0023, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0014, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0006, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0020, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0017, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0014, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0010, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0012, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0014, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0103, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0020, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0020, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0023, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0010, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0024, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0020, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0014, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0024, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0014, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0024, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0014, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0018, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0017, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0013, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0021, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0100, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0019, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0012, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0012, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0124, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0017, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0020, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0012, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0017, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0023, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0088, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0021, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0016, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0016, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0011, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0017, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0020, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0013, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0018, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0018, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0008, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0016, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0020, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0016, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0019, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0020, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0022, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0132, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0087, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0012, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0009, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0013, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0009, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0010, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0019, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0021, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0015, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0015, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0024, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0096, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0007, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0021, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0019, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0024, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0091, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0021, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0086, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0012, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0009, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0022, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0023, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0022, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0013, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0018, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0023, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0015, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0019, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0009, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0023, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0023, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0018, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0024, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0018, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0010, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0022, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0018, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0018, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0013, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0012, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0024, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0014, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0015, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0009, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0019, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0024, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0014, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0005, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0017, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0020, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0022, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0087, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0091, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0017, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0010, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0012, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0021, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0016, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0087, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0023, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0011, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0023, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0023, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0017, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0017, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0087, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0021, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0090, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0080, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0088, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0017, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0015, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0017, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0017, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0023, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0024, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0021, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0022, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0008, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0023, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0082, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0012, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0011, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0009, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0015, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0012, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0022, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0016, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0013, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0023, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0021, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0009, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0018, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0105, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0018, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0018, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0023, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0023, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0022, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0016, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0023, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0012, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0022, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0015, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0011, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0017, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0018, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0020, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0024, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0019, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0008, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0080, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0099, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0013, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0005, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0008, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0017, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0130, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0015, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0083, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0016, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0089, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0017, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0020, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0015, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0125, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0016, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0022, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0023, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0011, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0023, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0004, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0015, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0006, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0009, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0022, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0014, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0021, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0016, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0006, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0089, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0011, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0086, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0017, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0014, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0010, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0021, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0022, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0095, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0016, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0019, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0024, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0017, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0024, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0024, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0017, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0016, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0006, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0015, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0008, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0020, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0006, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0014, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0022, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0018, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0012, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0016, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0021, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0023, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0023, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0017, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0019, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0021, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0024, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0011, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0018, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0020, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0084, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0014, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0020, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0019, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0082, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0021, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0023, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0086, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0022, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0023, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0087, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0099, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0013, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0017, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0020, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0014, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0081, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0086, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0086, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0083, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0088, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0012, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0020, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0007, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0010, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0005, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0085, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0006, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0019, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0155, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0024, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0008, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0016, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0096, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0011, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0023, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0017, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0119, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0017, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0011, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0108, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0011, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0012, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0018, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0024, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0020, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0018, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0021, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0022, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0020, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0002, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0006, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0080, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0015, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0084, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0021, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0020, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0011, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0022, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0012, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0015, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0017, device='cuda:0', grad_fn=<NllLossBackward>)
Saving model at iteration: 20000
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0093, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0014, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0016, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0109, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0021, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0018, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0023, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0009, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0009, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0019, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0020, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0019, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0013, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0024, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0020, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0011, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0016, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0016, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0011, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0086, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0013, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0011, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0016, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0009, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0006, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0021, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0006, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0008, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0011, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0016, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0016, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0019, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0019, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0016, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0013, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0088, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0023, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0024, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0013, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0012, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0012, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0024, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0007, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0017, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0016, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0011, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0007, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0013, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0011, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0009, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0022, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0023, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0087, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0021, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0112, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0012, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0018, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0015, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0022, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0015, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0089, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0004, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0022, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0017, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0014, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0007, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0017, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0021, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0024, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0019, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0020, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0009, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0015, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0023, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0007, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0020, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0020, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0017, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0021, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0018, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0021, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0022, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0018, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0021, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0015, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0020, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0094, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0106, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0018, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0022, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0017, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0023, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0023, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0016, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0017, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0023, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0020, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0021, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0119, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0018, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0013, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0015, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0022, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0083, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0013, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0118, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0021, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0009, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0013, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0018, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0014, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0022, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0022, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0024, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0022, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0022, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0022, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0094, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0023, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0016, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0087, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0019, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0019, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0082, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0011, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0103, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0016, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0020, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0111, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0109, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0021, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0022, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0016, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0023, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0020, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0094, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0016, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0022, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0020, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0089, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0085, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0024, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0023, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0013, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0016, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0023, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0023, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0012, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0014, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0021, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0023, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0008, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0081, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0020, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0084, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0086, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0159, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0009, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0017, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0012, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0024, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0024, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0103, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0020, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0022, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0014, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0022, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0017, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0022, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0020, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0016, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0023, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0017, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0024, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0021, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0018, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0023, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0022, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0006, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0022, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0017, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0009, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0012, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0022, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0022, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0021, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0020, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0009, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0015, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0017, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0005, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0017, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0024, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0019, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0012, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0023, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0020, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0005, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0086, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0081, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0022, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0020, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0080, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0009, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0014, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0021, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0016, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0010, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0021, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0024, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0022, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0021, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0024, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0023, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0010, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0012, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0024, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0022, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0020, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0024, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0011, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0023, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0020, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0010, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0018, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0024, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0088, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0021, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0017, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0016, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0080, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0018, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0023, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0018, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0023, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0007, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0110, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0010, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0014, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0089, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0007, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0094, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0020, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0010, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0017, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0024, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0024, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0007, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0023, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0017, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0020, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0080, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0007, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0018, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0010, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0014, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0022, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0017, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0017, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0011, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0024, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0016, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0012, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0015, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0023, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0022, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0021, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0023, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0098, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0016, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0019, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0008, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0018, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0017, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0009, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0023, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0013, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0022, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0019, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0021, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0021, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0017, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0022, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0021, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0088, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0020, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0021, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0022, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0018, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0012, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0009, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0020, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0007, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0083, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0012, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0021, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0020, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0022, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0016, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0023, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0022, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0023, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0010, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0017, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0014, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0023, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0017, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0018, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0021, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0011, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0015, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0016, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0016, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0013, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0014, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0010, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0104, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0022, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0019, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0024, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0093, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0011, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0022, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0022, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0004, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0018, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0011, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0017, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0014, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0022, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0022, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0016, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0023, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0014, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0021, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0084, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0023, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0013, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0080, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0020, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0022, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0009, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0021, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0013, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0009, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0006, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0014, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0017, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0017, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0013, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0021, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0121, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0096, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0012, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0021, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0024, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0017, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0017, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0022, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0022, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0011, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0006, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0016, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0008, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0021, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0010, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0016, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0018, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0012, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0022, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0021, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0024, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0019, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0005, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0023, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0013, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0017, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0091, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0019, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0024, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0021, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0010, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0017, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0010, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0020, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0012, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0082, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0139, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0018, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0018, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0024, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0004, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0021, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0114, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0014, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0015, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0019, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0023, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0018, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0024, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0023, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0097, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0006, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0008, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0011, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0024, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0106, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0010, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0011, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0017, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0013, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0117, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0022, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0010, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0103, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0024, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0021, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0016, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0085, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0014, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0009, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0019, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0008, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0014, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0017, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0012, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0020, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0093, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0015, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0084, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0093, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0024, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0088, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0007, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0018, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0080, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0094, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0018, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0018, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0003, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0021, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0023, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0023, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0015, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0022, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0021, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0019, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0020, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0014, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0009, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0015, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0022, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0019, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0006, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0016, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0021, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0003, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0022, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0018, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0104, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0095, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0015, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0021, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0019, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0022, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0016, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0012, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0011, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0010, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0024, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0018, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0006, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0018, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0080, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0011, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0018, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0014, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0015, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0020, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0024, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0017, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0015, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0014, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0010, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0009, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0021, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0023, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0010, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0011, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0013, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0023, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0095, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0018, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0016, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0018, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0018, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0024, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0084, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0007, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0085, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0014, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0023, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0022, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0023, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0021, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0097, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0023, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0004, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0083, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0023, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0019, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0013, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0090, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0091, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0016, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0014, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0086, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0011, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0022, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0024, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0012, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0022, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0016, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0021, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0022, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0024, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0023, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0093, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0024, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0014, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0021, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0015, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0018, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0022, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0017, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0112, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0008, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0019, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0010, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0001, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0016, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0005, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0095, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0019, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0009, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0021, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0098, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0021, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0019, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0021, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0011, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0020, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0014, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0006, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0018, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0023, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0021, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0014, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0019, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0011, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0017, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0015, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0012, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0006, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0020, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0011, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0023, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0013, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0006, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0098, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0012, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0019, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0016, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0021, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0018, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0023, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0022, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0005, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0021, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0010, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0017, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0014, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0019, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0009, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0019, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0113, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0020, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0020, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0015, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0023, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0018, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0013, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0006, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0010, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0022, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0011, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0017, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0021, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0080, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0014, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0020, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0009, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0019, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0012, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0015, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0020, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0022, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0003, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0015, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0020, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0100, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0015, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0008, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0014, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0011, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0013, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0018, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0020, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0011, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0022, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0024, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0018, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0015, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0018, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0014, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0018, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0024, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0003, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0018, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0011, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0021, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0024, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0016, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0019, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0013, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0005, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0083, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0006, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0015, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0014, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0022, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0022, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0019, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0013, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0020, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0085, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0012, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0007, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0020, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0019, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0018, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0013, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0011, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0097, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0015, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0010, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0024, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0019, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0022, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0024, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0009, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0017, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0016, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0080, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0017, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0019, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0097, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0015, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0024, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0015, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0016, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0018, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0015, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0014, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0024, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0018, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0019, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0024, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0019, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0096, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0013, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0023, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0007, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0015, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0011, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0022, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0012, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0020, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0019, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0015, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0009, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0023, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0023, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0018, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0006, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0021, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0010, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0116, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0014, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0017, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0013, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0024, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0023, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0020, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0024, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0006, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0007, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0016, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0014, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0022, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0021, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0009, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0019, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0014, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0022, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0013, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0024, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0021, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0080, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0020, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0009, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0019, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0015, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0013, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0008, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0016, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0021, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0020, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0015, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0003, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0002, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0022, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0082, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0018, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0018, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0020, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0080, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0024, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0024, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0087, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0013, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0014, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0081, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0011, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0022, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0022, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0129, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0088, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0021, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0024, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0103, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0085, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0127, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0015, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0015, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0121, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0018, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0014, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0017, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0022, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0017, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0024, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0087, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0020, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0023, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0105, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0121, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0143, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0084, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0024, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0010, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0014, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0021, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0010, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0017, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0013, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0013, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0024, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0009, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0018, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0023, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0020, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0024, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0095, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0019, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0024, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0023, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0013, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0024, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0088, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0096, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0023, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0011, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0080, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0023, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0013, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0021, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0021, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0096, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0008, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0014, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0082, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0024, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0087, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0023, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0022, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0083, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0017, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0023, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0024, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0082, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0024, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0020, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0022, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0020, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0010, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0013, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0095, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0016, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0020, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0009, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0016, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0084, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0022, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0019, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0018, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0084, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0008, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0020, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0023, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0014, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0096, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0021, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0016, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0016, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0015, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0020, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0020, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0024, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0017, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0118, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0012, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0021, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0020, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0023, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0021, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0023, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0018, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0019, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0022, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0014, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0012, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0021, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0022, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0148, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0017, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0104, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0023, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0024, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0012, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0024, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0088, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0021, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0009, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0021, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0092, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0024, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0083, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0020, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0011, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0021, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0024, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0021, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0021, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0009, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0014, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0017, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0020, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0021, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0007, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0108, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0017, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0013, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0009, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0011, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0018, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0014, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0024, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0015, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0015, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0016, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0007, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0080, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0014, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0023, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0015, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0009, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0011, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0017, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0015, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0021, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0019, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0020, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0021, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0019, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0084, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0082, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0090, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0016, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0018, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0014, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0019, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0021, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0081, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0016, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0023, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0082, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0020, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0085, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0016, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0083, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0011, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0021, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0019, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0015, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0011, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0022, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0096, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0024, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0022, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0019, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0019, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0022, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0020, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0089, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0021, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0012, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0021, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0014, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0023, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0016, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0013, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0024, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0118, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0010, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0020, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0107, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0012, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0013, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0081, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0021, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0085, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0011, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0093, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0006, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0011, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0013, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0017, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0011, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0081, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0023, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0015, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0017, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0012, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0020, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0108, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0097, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0015, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0015, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0021, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0020, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0022, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0020, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0017, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0081, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0006, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0013, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0016, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0010, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0023, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0022, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0022, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0024, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0022, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0016, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0021, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0013, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0021, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0023, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0016, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0096, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0019, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0023, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0020, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0093, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0023, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0024, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0023, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0014, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0018, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0017, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0020, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0123, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0008, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0119, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0022, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0021, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0024, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0104, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0013, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0015, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0019, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0020, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0094, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0089, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0014, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0081, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0083, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0019, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0020, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0016, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0013, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0024, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0015, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0082, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0086, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0023, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0023, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0009, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0085, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0017, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0021, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0022, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0081, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0018, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0019, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0014, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0016, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0014, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0085, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0023, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0011, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0080, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0021, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0022, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0010, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0014, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0017, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0022, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0018, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0090, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0103, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0023, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0012, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0018, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0015, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0088, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0022, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0021, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0022, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0024, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0016, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0114, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0014, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0096, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0014, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0018, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0016, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0023, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0006, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0014, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0022, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0020, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0016, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0019, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0020, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0014, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0018, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0021, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0019, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0024, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0024, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0019, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0020, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0086, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0016, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0024, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0015, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0021, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0021, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0013, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0085, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0007, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0084, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0023, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0020, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0019, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0023, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0121, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0024, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0019, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0017, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0020, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0081, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0012, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0084, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0021, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0020, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0013, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0024, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0082, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0018, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0011, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0014, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0023, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0084, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0012, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0009, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0004, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0022, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0109, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0018, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0014, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0021, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0096, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0107, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0080, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0083, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0024, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0020, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0021, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0103, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0023, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0019, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0023, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0093, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0022, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0024, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0024, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0022, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0017, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0021, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0020, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0020, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0015, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0015, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0022, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0022, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0148, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0012, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0012, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0021, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0012, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0129, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0018, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0008, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0022, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0023, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0023, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0087, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0090, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0022, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0083, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0016, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0006, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0020, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0080, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0023, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0013, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0022, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0084, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0024, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0019, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0082, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0098, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0085, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0022, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0020, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0017, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0020, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0022, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0022, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0022, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0024, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0014, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0006, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0020, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0005, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0024, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0091, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0094, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0020, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0019, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0012, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0088, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0017, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0019, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0108, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0087, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0022, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0094, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0018, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0019, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0018, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0010, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0022, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0095, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0081, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0087, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0014, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0011, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0014, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0014, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0081, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0129, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0083, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0021, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0024, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0011, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0012, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0092, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0021, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0013, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0132, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0089, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0010, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0024, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0021, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0016, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0010, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0023, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0017, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0092, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0022, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0020, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0082, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0082, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0118, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0017, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0014, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0024, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0018, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0103, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0015, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0122, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0008, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0091, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0022, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0109, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0090, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0024, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0080, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0019, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0083, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0023, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0022, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0087, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0094, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0020, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0114, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0090, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0024, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0018, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0019, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0017, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0012, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0023, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0082, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0008, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0090, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0023, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0110, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0013, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0099, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0018, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0097, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0021, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0089, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0023, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0023, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0082, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0018, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0006, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0022, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0023, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0018, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0017, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0080, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0011, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0023, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0017, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0023, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0012, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0008, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0019, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0019, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0024, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0023, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0093, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0023, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0081, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0023, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0013, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0080, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0022, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0021, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0105, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0018, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0015, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0023, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0024, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0013, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0022, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0084, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0082, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0010, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0093, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0086, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0023, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0096, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0020, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0017, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0113, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0010, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0010, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0107, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0081, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0118, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0017, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0007, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0024, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0083, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0090, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0109, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0083, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0015, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0024, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0007, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0018, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0087, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0022, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0081, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0024, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0020, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0090, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0096, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0101, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0080, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0080, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0086, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0089, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0109, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0110, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0084, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0019, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0016, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0081, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0087, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0143, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0084, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0020, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0012, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0101, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0091, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0015, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0083, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0019, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0012, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0015, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0014, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0017, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0091, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0018, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0019, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0018, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0017, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0021, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0015, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0024, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0095, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0019, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0020, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0018, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0020, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0023, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0101, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0015, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0011, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0022, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0009, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0016, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0122, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0010, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0011, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0023, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0011, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0096, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0023, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0023, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0023, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0006, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0024, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0024, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0102, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0014, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0019, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0014, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0091, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0021, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0016, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0008, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0020, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0087, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0024, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0020, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0096, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0089, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0024, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0021, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0083, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0020, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0020, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0012, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0015, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0023, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0008, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0082, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0022, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0019, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0015, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0080, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0020, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0012, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0023, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0080, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0017, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0014, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0021, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0017, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0080, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0009, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0018, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0015, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0010, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0019, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0019, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0015, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0016, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0020, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0018, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0019, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0015, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0023, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0024, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0006, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0010, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0022, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0021, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0081, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0012, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0017, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0021, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0008, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0022, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0019, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0024, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0018, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0021, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0089, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0012, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0022, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0024, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0021, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0012, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0020, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0016, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0006, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0013, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0013, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0021, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0018, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0022, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0022, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0024, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0023, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0018, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0017, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0017, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0015, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0024, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0020, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0004, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0010, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0088, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0014, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0013, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0021, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0021, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0012, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0024, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0022, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0016, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0013, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0013, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0009, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0020, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0087, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0022, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0024, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0021, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0020, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0021, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0016, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0016, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0011, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0016, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0015, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0022, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0024, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0024, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0023, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0015, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0021, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0011, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0015, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0020, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0017, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0024, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0014, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0008, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0021, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0017, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0021, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0012, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0009, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0014, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0092, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0013, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0012, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0018, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0006, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0022, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0023, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0018, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0017, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0008, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0013, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0012, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0020, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0020, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0014, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0007, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0019, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0012, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0003, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0023, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0014, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0020, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0017, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0010, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0020, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0023, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0021, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0103, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0086, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0011, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0019, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0015, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0023, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0011, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0016, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0021, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0019, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0014, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0011, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0007, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0018, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0012, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0021, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0006, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0015, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0018, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0010, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0019, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0013, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0086, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0020, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0014, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0016, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0023, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0020, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0018, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0023, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0024, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0020, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0013, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0017, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0088, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0010, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0013, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0017, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0016, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0011, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0023, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0023, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0022, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0024, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0023, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0022, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0084, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0016, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0020, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0020, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0022, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0016, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0013, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0082, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0017, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0096, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0012, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0016, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0085, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0014, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0019, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0021, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0020, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0008, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0018, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0020, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0091, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0018, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0016, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0020, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0009, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0008, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0013, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0022, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0021, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0015, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0015, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0018, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0020, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0023, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0013, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0019, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0011, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0011, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0013, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0021, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0011, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0024, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0005, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0020, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0091, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0016, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0013, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0005, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0012, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0024, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0019, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0016, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0083, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0106, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0021, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0023, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0024, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0016, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0014, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0018, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0021, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0019, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0023, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0014, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0019, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0011, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0024, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0022, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0019, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0010, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0018, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0004, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0021, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0017, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0020, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0017, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0019, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0019, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0023, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0015, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0021, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0019, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0022, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0019, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0019, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0011, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0018, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0018, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0108, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0017, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0019, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0023, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0010, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0010, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0018, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0020, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0019, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0014, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0007, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0021, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0022, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0024, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0017, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0015, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0011, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0021, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0024, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0022, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0021, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0008, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0022, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0098, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0013, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0019, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0019, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0020, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0019, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0021, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0011, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0012, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0019, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0083, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0014, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0020, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0084, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0024, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0018, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0022, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0019, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0020, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0008, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0020, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0011, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0015, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0017, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0111, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0007, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0016, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0022, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0023, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0020, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0102, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0006, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0023, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0023, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0021, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0020, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0020, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0013, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0011, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0010, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0023, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0082, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0023, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0021, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0017, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0085, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0024, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0010, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0023, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0022, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0019, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0014, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0097, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0091, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0013, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0006, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0014, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0014, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0009, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0012, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0007, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0012, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0004, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0010, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0016, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0109, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0019, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0022, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0020, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0016, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0014, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0019, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0015, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0021, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0023, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0014, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0015, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0093, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0091, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0010, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0014, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0010, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0015, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0022, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0024, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0013, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0017, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0014, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0011, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0011, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0022, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0024, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0023, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0022, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0010, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0021, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0016, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0086, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0012, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0020, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0005, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0017, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0014, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0018, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0018, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0012, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0086, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0016, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0018, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0023, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0016, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0019, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0016, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0018, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0021, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0011, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0019, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0015, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0019, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0016, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0017, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0010, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0097, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0017, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0024, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0021, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0010, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0019, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0096, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0015, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0088, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0024, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0023, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0090, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0020, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0024, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0016, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0018, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0099, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0016, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0016, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0012, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0020, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0017, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0023, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0080, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0019, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0017, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0018, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0014, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0017, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0022, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0020, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0023, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0018, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0019, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0022, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0021, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0024, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0124, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0019, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0081, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0020, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0012, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0024, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0024, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0015, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0023, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0023, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0019, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0022, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0019, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0020, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0024, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0015, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0080, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0014, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0019, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0012, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0023, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0018, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0017, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0023, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0021, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0023, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0017, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0023, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0019, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0020, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0013, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0023, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0011, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0024, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0022, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0019, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0015, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0021, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0016, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0024, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0015, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0020, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0022, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0023, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0016, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0023, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0019, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0013, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0022, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0023, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0029, device='cuda:0', grad_fn=<NllLossBackward>)
